On this page
Latency
Percentile
Latency
A :namespace
is an isolated set of documents and is implicitly created when
the first document is inserted. Within a namespace, documents are uniquely
referred to by their ID. Upserting a document will overwrite any existing
document with the same ID.
A namespace name can only contain ASCII alphanumeric characters, plus the -
, _
, and .
special characters, and cannot be longer than 128 characters (i.e. must match [A-Za-z0-9-_.]{1,128}
).
For performance, we recommend creating a namespace per isolated document space instead of filtering when possible. See Performance.
If a write request returns OK, data is guaranteed to be atomically and durably written to object storage. By default, writes are immediately visible to queries. You can read more about how upserts work on the Architecture page.
Each write request can have a maximum payload size of 256 MB. To maximize throughput and minimize cost, we recommend writing in large batches.
Every write must be associated with a document ID. Document IDs are unsigned 64-bit integers, 128-bit UUIDs, or strings. Mixing ID types in a namespace is not supported.
turbopuffer supports the following types of writes:
Upserts documents in a column-based format. This field is an object, where each key is the name of a column, and each value is an array of values for that column.
The id
key is required, and must contain an array of document IDs.
The vector
key is required if the namespace has a vector index. For
non-vector namespaces, this key should be omitted. If present, it must
contain an array of vectors.
Any other keys will be stored as attributes.
Each column must be the same length. When a document doesn't have a value for a
given column, pass null
.
Example: {"id": [1, 2], "vector": [[1, 2, 3], [4, 5, 6]], "name": ["foo", "bar"]}
Note: the v1 write API used null vectors to represent deletes. This is no longer
the case in the v2 API - use the deletes
field instead.
Upserts documents in a row-based format. Each row is an object with an id
key,
and a number of other keys.
The id
key is required, and must contain a document ID.
The vector
key is required if the namespace has a vector index. For non-vector
namespaces, this key should be omitted. If present, it must contain a vector.
Any other keys will be stored as attributes.
Example: [{"id": 1, "vector": [1, 2, 3], "name": "foo"}, {"id": 2, "vector": [4, 5, 6], "name": "bar"}]
Patches documents in a column-based format. Identical to
upsert_columns
, but instead of overwriting entire
documents, only the specified keys are written.
The vector
key currently cannot be patched.
Any patches to IDs that don't already exist in the namespace will be ignored; patches will not create any missing documents.
Example: {"id": [1, 2], "name": ["baz", "qux"]}
Patches documents in a row-based format. Identical to
upsert_rows
, but instead of overwriting entire
documents, only the specified keys are written.
The vector
key currently cannot be patched.
Any patches to IDs that don't already exist in the namespace will be ignored; patches will not create any missing documents.
Example: [{"id": 1, "name": "baz"}, {"id": 2, "name": "qux"}]
Deletes documents by ID. Must be an array of document IDs.
Example: [1, 2, 3]
You can delete documents that match a filter using delete_by_filter
. It has the same syntax as the filters
parameter in the query API.
Example: ['page_id', 'Eq', '123']
The function used to calculate vector similarity. Possible values are cosine_distance
or euclidean_squared
.
cosine_distance
is defined as 1 - cosine_similarity
and ranges from 0 to 2.
Lower is better.
euclidean_squared
is defined as sum((x - y)^2)
. Lower is better.
Copy all documents from a namespace into this namespace. This operation is currently limited to copying within the same region and organization. The initial request currently cannot make schema changes or contain documents. Contact us if you need any of this.
Copying is billed at a 50% write discount which stacks with the up to 50% discount for batched writes. This is a faster, cheaper alternative to re-upserting documents for backups and namespaces that share documents.
Example: "source-namespace"
By default, the schema is inferred from the passed data. See Defining the Schema below for details.
There are cases where you want to manually specify the schema because turbopuffer can't automatically infer it. For example, to specify UUID types, configure full-text search for an attribute, or disable filtering for an attribute.
Example: {"permissions": "[]uuid", "text": {"type": "string", "full_text_search": true}, "encrypted_blob": {"type": "string", "filterable": false}}
Only available as part of our enterprise offerings. Contact us.
Setting a Customer Managed Encryption Key (CMEK) will encrypt all data in a namespace using a secret coming from your cloud KMS. Once set, all subsequent writes to this namespace will be encrypted, but data written prior to this upsert will be unaffected.
Currently, turbopuffer does not re-encrypt data when you rotate key versions, meaning old data will remain encrypted using older key verisons, while fresh writes will be encrypted using the latest versions. Revoking old key versions will cause data loss. To re-encrypt your data using a more recent key, use the export API to re-upsert into a new namespace.
Example: { "cmek": { "key_name": "projects/myproject/locations/us-central1/keyRings/EXAMPLE/cryptoKeys/KEYNAME" } }
Documents can optionally include attributes, which can be used to filter search results, for FTS indexes, or simply to store additional data.
Attributes must have consistent value types. For example, if a document is
upserted containing attribute key foo
with a string value, all future
documents that specify foo
must also use a string value (or null).
The schema is automatically inferred, but can be configured to
control type and indexing behavior. Some attribute types, such as the uuid
or
datetime
type, cannot be inferred automatically, and must be specified in the
schema.
If a new attribute is added, this attribute will default to null
for any
documents that existed before the attribute was added. If you would like to
backfill values for existing documents, use
patch_columns
.
By default, all attributes are indexed. To disable indexing for an attribute,
set the filterable
field to false
in the schema, for a 50% discount and
improved indexing performance.
Some limits apply to attribute sizes and number of attribute names per namespace. See Limits.
Vectors are arrays of f32 or f16 values, and are encoded in the API as an array of JSON numbers or as a base64-encoded string.
To use f16
vectors, the vector
field must be explicitly specified in the schema
when first creating the namespace.
Each vector in the namespace must have the same number of dimensions.
If using the base64 encoding, the vector must be serialized in little-endian float32 or float16 binary format, then base64-encoded. The base64 string encoding can be more efficient on both the client and server.
A namespace can be created without vectors. In this case, the vector
key
must be omitted from all write requests.
Bulk document operations should use a column-oriented layout for best
performance. You can pass any combination of upsert_columns
, patch_columns
,
and deletes
to the write request.
If the same document ID appears multiple times in the request, the request will fail with an HTTP 400 error.
Row-based writes may be more convenient than column-based writes. You can pass
any combination of upsert_rows
, patch_rows
, and deletes
to the write
request.
If the same document ID appears multiple times in the request, the request will fail with an HTTP 400 error.
To delete documents that match a filter, use delete_by_filter
. This operation will return
the actual number of documents removed.
Because the operation internally issues a query to determine which documents to delete, this operation is billed as both a query and a write operation.
delete_by_filter
has the same syntax as the filters
parameter in the query API.
The schema is optionally set on upsert to configure type and indexing behavior. By default, types are automatically inferred from the passed data and every attribute is indexed. To see what types were inferred, you can inspect the schema.
The schema documentation lists all supported attribute types and indexing options. A few examples where manually configuring the schema is needed:
filterable:false
) for an attribute, for a 50% discount and improved indexing performance.You can choose to pass the schema on every upsert, or only the first. There's no
performance difference.
If a new attribute is added, this attribute will default to null
for any
documents that existed before the attribute was added.
An example of (1), (2), and (3) on upsert: