Sharing datasets#
Datasets can also be shared by using HuggingFace datasets to upload / download Lilac formatted datasets.
For example, the lilac-glaive dataset was uploaded from a local project, and can be downloaded with the download CLI.
From the CLI#
Upload#
To upload a dataset from a local project:
lilac upload local/glaive --url_or_repo=lilacai/glaive
Optional arguments:
--project_dir: Specify a project directory. Defaults to env.LILAC_PROJECT_DIR.
--url_or_repo: Specify a url / repo to upload to. Defaults to 'lilac-{namespace}-{dataset_name}'.
--public: Make the dataset public. If private, the user downloading the dataset must have read
access, and specify a huggingface token.
--readme_suffix: An additional suffix to add to the readme file of the dataset.
--hf_token: A huggingface access token, defaults to HF_ACCESS_TOKEN.
Download#
To download a dataset that was uploaded with the lilac upload
command:
lilac download lilacai/glaive
Optional arguments:
--project_dir: Specify a project directory. Defaults to env.LILAC_PROJECT_DIR.
--dataset_namespace: A dataset namespace to use. Defaults to the namespace used on the uploaded
dataset.
--dataset_name: A dataset name to use. Defaults to the name used on the uploaded dataset.
--hf_token: A huggingface access token, defaults to HF_ACCESS_TOKEN.
--overwrite: When true, overwrites any local dataset with the same namespace/name. When false,
and a dataset with the same name exists, throws an error.
From Python#
The python APIs are the same as the CLI APIs above.
Upload#
To upload a dataset from Python:
import lilac as ll
ll.upload('local/glaive', url_or_repo='lilacai/glaive')
For more details, see the API reference for lilac.upload
.
Download#
To download a dataset from Python:
import lilac as ll
ll.download('lilacai/glaive')
For more details, see the API reference for lilac.download
.