Lilac#

Better data, better AI

Discord Follow on Twitter

Lilac is an open-source tool that enables data and AI practitioners improve their products by improving their data.

Try Lilac on HuggingFace Spaces, where we’ve preloaded popular datasets like OpenOrca. Try a semantic search for “As a language model” on the OpenOrca dataset!

Why use Lilac?#

  • Understand your data with powerful search and filtering.

  • Collaborate with your team on a single, centralized dataset.

  • Apply best practices for data curation, like removing duplicates and PII to reduce dataset size and lower training cost and time.

  • See how your pipeline impacts your data using our diff viewer


Get started#

To get started, follow the installation steps. To familiarize yourself with the Lilac, follow our Quickstart guide to visualize and enrich your first dataset in Lilac.

Guides#

For details on how Lilac works, see our guides on Datasets, Signals, Concepts, Embeddings, and Deployment.

Community#

Need help or have a feature suggestion? Join our community: