Logo

Schema

turbopuffer maintains a schema for each namespace with type and indexing behaviour for each attribute.

The schema can be modified as you upsert documents. The types and will be automatically inferred from the data, but can be configured for types that can't be inferred (e.g. UUIDs) or indexing behaviour (e.g. full-text search for an attribute).

Parameters

Every attribute can have the following fields in its schema specified at upsert time:

type stringdefault: inferred

The data type of the attribute. Supported types:

  • string: String
  • uint: Unsigned integer
  • uuid: 128-bit UUID
  • bool: Boolean
  • []string: Array of strings
  • []uint: Array of unsigned integers
  • []uuid: Array of UUIDs

All attribute types are nullable by default, except id and vector which are required. vector will become an optional attribute soon. If you need a namespace without a vector, simply set vector to a random float.

Most types can be inferred from the upsert payload, except uuid and []uuid, which need to be set explicitly. See UUID values for an example.

We'll be adding other data types soon. In the meantime, we suggest representing other data types as either strings or unsigned integers, e.g. date as a UNIX timestamp or lexicographically sortable string (RFC 3339/ISO 8601 format).


filterable booleandefault: true (false if full-text search is enabled)

Whether or not the attribute can be used in filters/WHERE clauses.

Unfiltered attributes don't have an index built for them, and are thus billed at a 50% discount (see pricing).


full_text_search booleandefault: false

Whether this attribute can be used as part of a BM25 full-text search. Requires the string or []string type, and by default, BM25-enabled attributes are not filterable. You can override this by setting filterable: true.

Can either be a boolean for default settings, or an object with the following optional fields:

  • language (string): The language of the text. Defaults to english. Supported languages
  • stemming (boolean): Language-specific stemming for the text. Defaults to false (i.e. do not stem).
  • remove_stopwords (boolean): Removes common words from the text based on language. Defaults to true (i.e. remove common words).
  • case_sensitive (boolean): Whether searching is case-sensitive. Defaults to false (i.e. case-insensitive).

If you require other types of full-text search options, please contact us.

Adding new attributes

New attributes can be added with an upsert. All documents prior to the write will have the attribute set to null.

In most cases, the schema is inferred from the data you upsert. However, as part of an upsert, you can choose to specify the schema for attributes through above parameters (i.e. to use UUID values or enable BM25 full-text indexing).

Changing existing attributes

We support online, in-place changes of the filterable setting, by setting the schema in an upsert.

Other index settings changes, attribute type changes, and attribute deletions currently cannot be done in-place. Consider exporting documents and upserting into a new namespace if you require a schema change.

After enabling the filterable setting for an attribute, the index needs time to build before this attribute can be used in a filter.

Inspect

To retrieve the current schema for a namespace, make a GET request to /v1/namespaces/:namespace/schema.

// GET /v1/namespaces/:namespace/schema

// Response payload
{
  "id": {
    "type": "uint"
  },
  "my-number": {
    "type": "uint",
    "filterable": true,
    "full_text_search": null
  },
  "my-text": {
    "type": "string",
    "filterable": false,
    "full_text_search": {
      "language": "english",
      "stemming": false,
      "remove_stopwords": true,
      "case_sensitive": false
    }
  }
}

turbopuffer currently supports language-aware stemming and stopword removal for full-text search. The following languages are supported:

  • arabic
  • danish
  • dutch
  • english
  • finnish
  • french
  • german
  • greek
  • hungarian
  • italian
  • norwegian
  • portuguese
  • romanian
  • russian
  • spanish
  • swedish
  • tamil
  • turkish

Other languages can be supported by contacting us.

© 2024 turbopuffer Inc.
Privacy PolicyTerms of service