Better data, better AI

Discord Follow on Twitter

Lilac is an open-source tool that enables data and AI practitioners improve their products by improving their data.

Try Lilac on HuggingFace Spaces, where we’ve preloaded popular datasets like OpenOrca. Try a semantic search for “As a language model” on the OpenOrca dataset!

Why use Lilac?#

  • Understand your data with powerful search and filtering.

  • Collaborate with your team on a single, centralized dataset.

  • Apply best practices for data curation, like removing duplicates and PII to reduce dataset size and lower training cost and time.

  • See how your pipeline impacts your data using our diff viewer

Get started#

To get started, follow the installation steps. To familiarize yourself with the Lilac, follow our Quickstart guide to visualize and enrich your first dataset in Lilac.


For details on how Lilac works, see our guides on Datasets, Signals, Concepts, Embeddings, and Deployment.


Need help or have a feature suggestion? Join our community: