Schema

Certain updates to schemas, such as making a previously unfilterable attribute filterable, may require a re-index of the namespace. Filtered query performance using the updated attribute may be degraded for a few minutes after updating the schema until the re-index is complete.

Every namespace in turbopuffer has an implicit schema over its attributes, which is built over time as you upsert data.

You can customize this schema manually to enable/disable certain features, i.e. enabling BM25 full-text search.

Schema fields

Every attribute has the following fields in its schema:

type stringdefault: inferred

The datatype of the attribute. Supported types:

  • ?string: Nullable string
  • ?uint: Nullable unsigned integer
  • []string: Array of strings
  • []uint: Array of unsigned integers

We'll be adding other data types soon. In the meantime, we'd recommend representing other data types as either strings or unsigned integers, i.e. boolean as 0/1, date as a lexicographically sortable string, etc.


filterable booleandefault: true*

Whether or not the attribute can be used in filters. Unfiltered attributes don't have an index built for them, and are thus billed at 50% (see pricing).

* Defaulted to false if full text search is enabled on the attribute.


full_text_search booleandefault: false

Whether this attribute can be used as part of a BM25 full-text search. Requires the ?string or []string type, and by default, BM25-enabled attributes are not filterable.

Can either be a boolean or an object with the following optional fields:

  • language (string): The language of the text. Defaults to english.
  • stemming (boolean): Language-specific stemming for the text. Defaults to false.
  • remove_stopwords (boolean): Removes common words from the text. Defaults to true.
  • case_sensitive (boolean): Whether searching is case-sensitive. Defaults to false.

Add new attributes

In most cases, the schema is inferred from the data you upsert. However, as part of an upsert, you can choose to specify the schema for an attribute (i.e. to enable BM25 full-text indexing).

See: Upsert with schema

Inspect the schema for a namespace

To retrieve the current schema for a namespace, make a GET request to /v1/vectors/:namespace/schema.

// GET /v1/vectors/:namespace/schema

// Response payload
{
  "my-number": {
    "type": "?uint",
    "filterable": true,
    "full_text_search": null
  },
  "my-text": {
    "type": "?string",
    "filterable": false,
    "full_text_search": {
      "language": "english",
      "stemming": false,
      "remove_stopwords": true,
      "case_sensitive": false
    }
  }
}

Update the schema of a namespace

To update the schema for a namespace, make a POST request to /v1/vectors/:namespace/schema. You can update more than one field at once by passing in multiple keys in the payload.

At the moment, only updates to the filterable field is supported (e.g. you can disable filtering for a discount)

// POST /v1/vectors/:namespace/schema

// Request payload
{
  "my-number": {
    "type": "?uint",
    "filterable": false, // Changing the filterable field from true -> false
    "full_text_search": null
  },
}

// Response payload
// Returns the entire schema, with updates applied.
{
  "my-number": {
    "type": "?uint",
    "filterable": false,
    "full_text_search": null
  },
  "my-text": {
    "type": "?string",
    "filterable": false,
    "full_text_search": {
      "language": "english",
      "stemming": false,
      "remove_stopwords": true,
      "case_sensitive": false
    }
  }
}

turbopuffer supports language-aware stemming and stopword removal for full-text search. The following languages are supported:

  • arabic
  • danish
  • dutch
  • english
  • finnish
  • french
  • german
  • greek
  • hungarian
  • italian
  • norwegian
  • portuguese
  • romanian
  • russian
  • spanish
  • swedish
  • tamil
  • turkish
© 2024 turbopuffer Inc.
Privacy PolicyTerms of service