Projects#
Lilac projects are a way to organize your datasets and concepts into separate directories.
Project configurations are a way to specify which datasets, signals and concepts to run to create a Lilac instance.
An example project directory:
~/my_project
lilac.yml
datasets/
open-orca/
concept/
Project directory#
When starting the webserver from the CLI or from Python, a project directory will be created for you
implicitly when passing project_dir
.
From CLI:
lilac start ~/my_project
From Python:
import lilac as ll
ll.start_server(project_dir='~/my_project')
In both cases, if ~/my_project
doesn’t exist, a directory will be created with an empty project
configuration:
# Lilac project config.
# See https://docs.lilacml.com/api_reference/index.html#lilac.Config for details.
datasets: []
If you want to start a Lilac project without starting a webserver, you can use init
.
From CLI:
lilac init ~/my_project
From Python:
import lilac as ll
ll.init(project_dir='~/my_project')
Setting the project directory globally#
If we don’t want to pass the project_dir
around, we can set the project dir globally by using the
LilacEnvironment.LILAC_PROJECT_DIR
environment flag, or the python API set_project_dir
.
export LILAC_PROJECT_DIR='~/my_project'
Python:
import lilac as ll
# This will set the `LILAC_PROJECT_DIR` environment flag globally. Any future calls to Lilac will use this project directory.
ll.set_project_dir('~/my_project')
Configuration#
The lilac.yml
file in the root of the project directory is a yaml instance of Config
.
An example configuration:
# Lilac project config.
# See https://docs.lilacml.com/api_reference/index.html#lilac.Config for details.
datasets:
- namespace: local
name: glue
source:
dataset_name: glue
config_name: ax
source_name: huggingface
embeddings:
- path: premise
embedding: gte-small
signals:
- path: premise
signal:
signal_name: pii
- path: hypothesis
signal:
signal_name: pii
settings:
ui:
media_paths:
- premise
Let’s break the configuration down. The first thing you’ll see is the datasets configuration, which
is an array of DatasetConfig
.
Here we have one dataset with namespace local
and name glue
.
datasets:
- namespace: local
name: glue
The next section defines the data source, this case we’re reading the
glue
dataset from HuggingFace with the ax
configuration:
source:
dataset_name: glue
config_name: ax
source_name: huggingface
The next section will define the embeddings to be computed on certain
fields. Here we’re computing the gte-small
embedding over the premise
field:
embeddings:
- path: premise
embedding: gte-small
The next section defines the signals to be computed on certain fields. Here
we’re computing pii
over premise
and also over hypothesis
.
signals:
- path: premise
signal:
signal_name: pii
- path: hypothesis
signal:
signal_name: pii
The last section defines the dataset settings. This configuration sets the premise
as the only
media path to be shown in the UI. The media path is what renders in the larger section of the UI, as
a larger text field.
settings:
ui:
media_paths:
- premise