Apply a concept to a dataset#

Concepts can be applied to an entire dataset to rank and score documents according to relevance to a concept.

From the UI#

Before we can apply a concept to a dataset, we need to compute an embedding index. See Compute embeddings for details.

Search by concept#

Once an embedding index has been computed, we can now use the search box to search by a concept:

Dataset items will be sorted, descending, by the highest concept score for each item. Chunks of text will be highlighted according to their concept score:

Compute the concept for the whole column#

Search by concept does not produce a score for every item in the dataset. If we want to add a score for every item, click the blue “compute” chip next to the preview in the schema.

Once the concept score has been computed, you will see a new column in the schema, highlighted in blue:

From Python#

Before we can search by concept, we must first compute an embedding index. Let’s compute gte-small over the text field:

dataset = ll.get_dataset('local', 'imdb')
dataset.compute_embedding('gte-small', path='text')

Once this is complete, we can search by a concept. Let’s search by the lilac/positive-sentiment concept.

r = dataset.select_rows(
  ['text', 'label'],
  searches=[
    ll.ConceptSearch(
      concept_namespace='lilac',
      concept_name='positive-sentiment',
      path='text')],
  limit=5)

Search by concept does not produce a score for every item in the dataset. If we want to want to add a score for every item, we can call Dataset.compute_signal:

dataset.compute_concept(
  namespace='lilac',
  concept_name='positive-sentiment',
  embedding='gte-small',
  path='text')

Once this is complete, we can print the items with the enriched column:

r = dataset.select_rows(['*'], limit=5)
print(r.df())

Output:

                                                text label __hfsplit__  \
0  I rented I AM CURIOUS-YELLOW from my video sto...   neg       train
1  "I Am Curious: Yellow" is a risible and preten...   neg       train
2  If only to avoid making this type of film in t...   neg       train
3  This film was probably inspired by Godard's Ma...   neg       train
4  I would put this at the top of my list of film...   neg       train

         text.lilac/positive-sentiment/gte-small/v11
0  [{'score': 0.532249927520752, '__value__': {'s...
1  [{'score': 0.059535492211580276, '__value__': ...
2  [{'score': 0.026503529399633408, '__value__': ...
3  [{'score': 0.17499613761901855, '__value__': {...
4  [{'score': 0.09144255518913269, '__value__': {...