Certain updates to schemas, such as making a previously unfilterable attribute filterable, may require a re-index of the namespace. Filtered query performance using the updated attribute may be degraded for a few minutes after updating the schema until the re-index is complete.
Every namespace in turbopuffer has an implicit schema over its attributes, which is built over time as you upsert data.
You can customize this schema manually to enable/disable certain features, i.e. enabling BM25 full-text search.
Every attribute has the following fields in its schema:
The data type of the attribute. Supported types:
string
: Stringuint
: Unsigned integeruuid
: 128-bit UUIDbool
: Boolean[]string
: Array of strings[]uint
: Array of unsigned integers[]uuid
: Array of UUIDsMost types can be inferred from the upsert payload, except uuid
and []uuid
,
which need to be set explicitly. See UUID values
for an example.
We'll be adding other data types soon. In the meantime, we'd recommend representing other data types as either strings or unsigned integers, e.g. date as a lexicographically sortable string.
Whether this attribute can be used as part of a BM25 full-text search. Requires the string
or []string
type, and by default, BM25-enabled attributes are not filterable.
Can either be a boolean or an object with the following optional fields:
language
(string): The language of the text. Defaults to english
.
stemming
(boolean): Language-specific stemming for the text. Defaults to false
.remove_stopwords
(boolean): Removes common words from the text. Defaults to true
.case_sensitive
(boolean): Whether searching is case-sensitive. Defaults to false
.In most cases, the schema is inferred from the data you upsert. However, as part of an upsert, you can choose to specify the schema for an attribute (i.e. to use UUID values or enable BM25 full-text indexing).
See: Upsert with schema
To retrieve the current schema for a namespace, make a GET
request to /v1/vectors/:namespace/schema
.
// GET /v1/vectors/:namespace/schema
// Response payload
{
"id": {
"type": "uint"
},
"my-number": {
"type": "uint",
"filterable": true,
"full_text_search": null
},
"my-text": {
"type": "string",
"filterable": false,
"full_text_search": {
"language": "english",
"stemming": false,
"remove_stopwords": true,
"case_sensitive": false
}
}
}
To update the schema for a namespace, make a POST
request to /v1/vectors/:namespace/schema
. You can update more than one field at once by passing in multiple keys in the payload.
At the moment, only updates to the filterable
field is supported (e.g. you can
disable filtering for a discount)
// POST /v1/vectors/:namespace/schema
// Request payload
{
"my-number": {
"type": "uint",
"filterable": false, // Changing the filterable field from true -> false
"full_text_search": null
},
}
// Response payload
// Returns the entire schema, with updates applied.
{
"id": {
"type": "uint"
},
"my-number": {
"type": "uint",
"filterable": false,
"full_text_search": null
},
"my-text": {
"type": "string",
"filterable": false,
"full_text_search": {
"language": "english",
"stemming": false,
"remove_stopwords": true,
"case_sensitive": false
}
}
}
turbopuffer supports language-aware stemming and stopword removal for full-text search. The following languages are supported:
arabic
danish
dutch
english
finnish
french
german
greek
hungarian
italian
norwegian
portuguese
romanian
russian
spanish
swedish
tamil
turkish