turbopuffer maintains a schema for each namespace with type and indexing behaviour for each attribute.
The schema can be modified as you upsert documents. The types and will be automatically inferred from the data, but can be configured for types that can't be inferred (e.g. UUIDs) or indexing behaviour (e.g. full-text search for an attribute).
Every attribute can have the following fields in its schema specified at upsert time:
The data type of the attribute. Supported types:
string
: Stringuint
: Unsigned integeruuid
: 128-bit UUIDbool
: Boolean[]string
: Array of strings[]uint
: Array of unsigned integers[]uuid
: Array of UUIDsAll attribute types are nullable by default, except id
and vector
which are
required. vector
will become an optional attribute soon. If you need a
namespace without a vector, simply set vector
to a random float.
Most types can be inferred from the upsert payload, except uuid
and []uuid
,
which need to be set explicitly. See UUID values
for an example.
We'll be adding other data types soon. In the meantime, we suggest representing other data types as either strings or unsigned integers, e.g. date as a UNIX timestamp or lexicographically sortable string (RFC 3339/ISO 8601 format).
Whether this attribute can be used as part of a BM25 full-text
search. Requires the string
or []string
type,
and by default, BM25-enabled attributes are not filterable. You can
override this by setting filterable: true
.
Can either be a boolean for default settings, or an object with the following optional fields:
language
(string): The language of the text. Defaults to english
. Supported languagesstemming
(boolean): Language-specific stemming for the text. Defaults to false
(i.e. do not stem).remove_stopwords
(boolean): Removes common words from the text based on language
. Defaults to true
(i.e. remove common words).case_sensitive
(boolean): Whether searching is case-sensitive. Defaults to false
(i.e. case-insensitive).If you require other types of full-text search options, please contact us.
New attributes can be added with an upsert. All
documents prior to the write will have the attribute set to null
.
In most cases, the schema is inferred from the data you upsert. However, as part
of an upsert, you can choose to specify the schema
for
attributes through above parameters (i.e. to use UUID values or enable BM25
full-text indexing).
We support online, in-place changes of the filterable
setting, by setting the schema in an upsert.
Other index settings changes, attribute type changes, and attribute deletions currently cannot be done in-place. Consider exporting documents and upserting into a new namespace if you require a schema change.
After enabling the filterable
setting for an attribute, the index needs time
to build before this attribute can be used in a filter.
To retrieve the current schema for a namespace, make a GET
request to /v1/namespaces/:namespace/schema
.
// GET /v1/namespaces/:namespace/schema
// Response payload
{
"id": {
"type": "uint"
},
"my-number": {
"type": "uint",
"filterable": true,
"full_text_search": null
},
"my-text": {
"type": "string",
"filterable": false,
"full_text_search": {
"language": "english",
"stemming": false,
"remove_stopwords": true,
"case_sensitive": false
}
}
}
turbopuffer currently supports language-aware stemming and stopword removal for full-text search. The following languages are supported:
arabic
danish
dutch
english
finnish
french
german
greek
hungarian
italian
norwegian
portuguese
romanian
russian
spanish
swedish
tamil
turkish
Other languages can be supported by contacting us.