Write Documents

POST /v2/namespaces/:namespace

Creates, updates, or deletes documents.

Latency

Upsert latency

500kb docs

Percentile

Latency

p50

285ms

p90

370ms

p99

688ms

A :namespace is an isolated set of documents and is implicitly created when the first document is inserted. Within a namespace, documents are uniquely referred to by their ID. Upserting a document will overwrite any existing document with the same ID.

A namespace name can only contain ASCII alphanumeric characters, plus the -, _, and . special characters, and cannot be longer than 128 characters (i.e. must match [A-Za-z0-9-_.]{1,128}).

For performance, we recommend creating a namespace per isolated document space instead of filtering when possible. See Performance.

If a write request returns OK, data is guaranteed to be atomically and durably written to object storage. By default, writes are immediately visible to queries. You can read more about how upserts work on the Architecture page.

Each write request can have a maximum payload size of 256 MB. To maximize throughput and minimize cost, we recommend writing in large batches.

Every write must be associated with a document ID. Document IDs are unsigned 64-bit integers, 128-bit UUIDs, or strings. Mixing ID types in a namespace is not supported.

turbopuffer supports the following types of writes:

Upserts: creates or overwrites an entire document.
Patches: updates one or more attributes of an existing document.
Deletes: deletes an entire document by ID.
Conditional writes: upsert, patch, or delete a document only if a condition.
Delete by filter: deletes documents that match a filter.
Copy from namespace: copies all documents from another namespace.

Parameters

upsert_rows array

Upserts documents in a row-based format. Each row is an object with an id key, and a number of other keys.

The id key is required, and must contain a document ID.

The vector key is required if the namespace has a vector index. For non-vector namespaces, this key should be omitted. If present, it must contain a vector.

Any other keys will be stored as attributes.

Example: [{"id": 1, "vector": [1, 2, 3], "name": "foo"}, {"id": 2, "vector": [4, 5, 6], "name": "bar"}]

upsert_columns object

Upserts documents in a column-based format. This field is an object, where each key is the name of a column, and each value is an array of values for that column.

The id key is required, and must contain an array of document IDs.

The vector key is required if the namespace has a vector index. For non-vector namespaces, this key should be omitted. If present, it must contain an array of vectors.

Any other keys will be stored as attributes.

Each column must be the same length. When a document doesn't have a value for a given column, pass null.

Example: {"id": [1, 2], "vector": [[1, 2, 3], [4, 5, 6]], "name": ["foo", "bar"]}

Note: the v1 write API used null vectors to represent deletes. This is no longer the case in the v2 API - use the deletes field instead.

patch_rows array

Patches documents in a row-based format. Identical to upsert_rows, but instead of overwriting entire documents, only the specified keys are written.

The vector key currently cannot be patched.

Any patches to IDs that don't already exist in the namespace will be ignored; patches will not create any missing documents.

Example: [{"id": 1, "name": "baz"}, {"id": 2, "name": "qux"}]

Patches are billed for the size of the patched attributes (not the full written documents), plus the cost of one query per write request (to read all the patched documents touched by the request).

patch_columns object

Patches documents in a column-based format. Identical to upsert_columns, but instead of overwriting entire documents, only the specified keys are written.

The vector key currently cannot be patched.

Any patches to IDs that don't already exist in the namespace will be ignored; patches will not create any missing documents.

Example: {"id": [1, 2], "name": ["baz", "qux"]}

deletes array

Deletes documents by ID. Must be an array of document IDs.

Example: [1, 2, 3]

upsert_condition object

Makes each write in upsert_rows and upsert_columns conditional on the upsert_condition being satisfied for the document with the corresponding ID.

The upsert_condition is evaluated before each write, using the current value of the document with the matching ID.

If the document exists and the condition is met, the write is applied (i.e. the document is updated).
If the document exists and the condition is not met, the write is skipped.
If the document does not exist, the write is applied unconditionally (i.e. the document is created).

The condition syntax matches the filters parameter in the query API, with an additional feature: you can reference the new value being written using $ref_new references. These look like {"$ref_new": "attr_123"} and can be used in place of value literals.

Example: ["Or", [["updated_at", "Lt", {"$ref_new": "updated_at"}], ["updated_at", "Eq", null]]]

This condition ensures that each upsert is only processed if the new document value has a newer "updated_at" timestamp than its current version.

patch_condition object

Like upsert_condition, but for patch_rows and patch_columns.

Any patches to IDs that don't already exist in the namespace will be ignored without evaluating the condition; patches will not create any missing documents.

delete_condition object

Like upsert_condition, but for deletes.

$ref_new references are given a null value for all attributes.

Does not apply to delete_by_filter. Prefer this over delete_by_filter when the set of IDs to conditionally delete is known ahead of time.

delete_by_filter object

You can delete documents that match a filter using delete_by_filter. It has the same syntax as the filters parameter in the query API.

If delete_by_filter is used in the same request as other write operations, delete_by_filter will be applied before the other operations. This allows you to delete rows that match a filter before writing new row with overlapping IDs. Note that patches to any deleted rows are ignored.

delete_by_filter is different from deletes with a delete_condition:

delete_by_filter: searches across the namespace for any matching document IDs, deleting all matches that it finds.
delete + delete_condition: only evaluates the condition on the IDs identified in deletes.

delete_condition does not apply to delete_by_filter.

Example: ["page_id", "Eq", 123]

delete_by_filter is billed the same as normal deletes, plus the cost of one query per write request (to determine which IDs to delete).

distance_metric cosine_distance | euclidean_squaredrequired unless copy_from_namespace is set or no vector is set

The function used to calculate vector similarity. Possible values are cosine_distance or euclidean_squared.

cosine_distance is defined as 1 - cosine_similarity and ranges from 0 to 2. Lower is better.

euclidean_squared is defined as sum((x - y)^2). Lower is better.

copy_from_namespace string

Copy all documents from a namespace into this namespace. This operation is currently limited to copying within the same region and organization. The initial request currently cannot make schema changes or contain documents. Contact us if you need any of this.

Copying is billed at a 50% write discount which stacks with the up to 50% discount for batched writes. This is a faster, cheaper alternative to re-upserting documents for backups and namespaces that share documents.

Example: "source-namespace"

schema object

By default, the schema is inferred from the passed data. See Defining the Schema below for details.

There are cases where you want to manually specify the schema because turbopuffer can't automatically infer it. For example, to specify UUID types, configure full-text search for an attribute, or disable filtering for an attribute.

Example: {"permissions": "[]uuid", "text": {"type": "string", "full_text_search": true}, "encrypted_blob": {"type": "string", "filterable": false}}

encryption objectoptional

Only available as part of our scale and enterprise plans.

Setting a Customer Managed Encryption Key (CMEK) will encrypt all data in a namespace using a secret coming from your cloud KMS. Once set, all subsequent writes to this namespace will be encrypted, but data written prior to this upsert will be unaffected.

Currently, turbopuffer does not re-encrypt data when you rotate key versions, meaning old data will remain encrypted using older key verisons, while fresh writes will be encrypted using the latest versions. Revoking old key versions will cause data loss. To re-encrypt your data using a more recent key, use the export API to re-upsert into a new namespace.

Example (GCP): { "cmek": { "key_name": "projects/myproject/locations/us-central1/keyRings/EXAMPLE/cryptoKeys/KEYNAME" } }

Example (AWS): { "cmek": { "key_name": "arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012" } }

Attributes

Documents can optionally include attributes, which can be used to filter search results, for FTS indexes, or simply to store additional data.

Attribute names can be up to 128 characters in length and must not start with a $ character.

Attributes must have consistent value types. For example, if a document is upserted containing attribute key foo with a string value, all future documents that specify foo must also use a string value (or null).

The schema is automatically inferred, but can be configured to control type and indexing behavior. Some attribute types, such as the uuid or datetime type, cannot be inferred automatically, and must be specified in the schema.

If a new attribute is added, this attribute will default to null for any documents that existed before the attribute was added. If you would like to backfill values for existing documents, use patch_rows or patch_columns.

By default, all attributes are indexed into an inverted index. An inverted index allows filters to be fast, even for large intersects.

To disable indexing for an attribute, set the filterable field to false in the schema, for a 50% discount and improved indexing performance. The attribute can still be returned, but not filtered.

Some limits apply to attribute sizes and number of attribute names per namespace. See Limits.

Vectors

Vectors are arrays of f32 or f16 values, and are encoded in the API as an array of JSON numbers or as a base64-encoded string.

To use f16 vectors, the vector field must be explicitly specified in the schema when first creating the namespace.

Each vector in the namespace must have the same number of dimensions.

If using the base64 encoding, the vector must be serialized in little-endian float32 or float16 binary format, then base64-encoded. The base64 string encoding can be more efficient on both the client and server.

A namespace can be created without vectors. In this case, the vector key must be omitted from all write requests.

Examples

Row-based writes

Row-based writes may be more convenient than column-based writes. You can pass any combination of upsert_rows, patch_rows, deletes, and delete_by_filter to the write request.

If the same document ID appears multiple times in the request, the request will fail with an HTTP 400 error.

Column-based writes

Bulk document operations should use a column-oriented layout for best performance. You can pass any combination of upsert_columns, patch_columns, deletes, and delete_by_filter to the write request.

If the same document ID appears multiple times in the request, the request will fail with an HTTP 400 error.

Conditional writes

To make writes conditional, use the upsert_condition, patch_condition, and delete_condition parameters. These let you specify a condition that must be satisfied for the write operation to each document to proceed.

Conditions are evaluated before each write, using the current value of the document with the matching ID.

If the document exists and the condition is met, the write is applied.
If the document exists and the condition is not met, the write is skipped.
If the document does not exist, the write is applied unconditionally for upserts and skipped unconditionally for patches and deletes.

The operation will return the actual number of documents written (upserted, patched, or deleted).

Internally, the operation performs a query (one per batch) to determine which documents match the condition, so it is billed as both a query and a write operation. However, if the condition is not met for a given document, that write is skipped and not billed.

Conditional deletes are distinct from delete_by_filter, which should be used when the set of IDs to conditionally delete is not known ahead of time.

Delete by filter

To delete documents that match a filter, use delete_by_filter. This operation will return the actual number of documents removed.

Because the operation internally issues a query to determine which documents to delete, this operation is billed as both a query and a write operation.

delete_by_filter has the same syntax as the filters parameter in the query API.

Schema

The schema is optionally set on upsert to configure type and indexing behavior. By default, types are automatically inferred from the passed data and every attribute is indexed. To see what types were inferred, you can inspect the schema.

The schema documentation lists all supported attribute types and indexing options. A few examples where manually configuring the schema is needed:

UUID values serialized as strings can be stored in turbopuffer in an optimized format
Full-text search for a string attribute
Disabling indexing/filtering (filterable:false) for an attribute, for a 50% discount and improved indexing performance.

You can choose to pass the schema on every upsert, or only the first. There's no performance difference. If a new attribute is added, this attribute will default to null for any documents that existed before the attribute was added.

An example of (1), (2), and (3) on upsert: