Write Documents

POST /v2/namespaces/:namespace

Creates, updates, or deletes documents.

Latency

Upsert latency
500kb docs

Percentile

Latency

p50
285ms
p90
370ms
p99
688ms

A :namespace is an isolated set of documents and is implicitly created when the first document is inserted. Within a namespace, documents are uniquely referred to by their ID. Upserting a document will overwrite any existing document with the same ID.

A namespace name can only contain ASCII alphanumeric characters, plus the -, _, and . special characters, and cannot be longer than 128 characters (i.e. must match [A-Za-z0-9-_.]{1,128}).

For performance, we recommend creating a namespace per isolated document space instead of filtering when possible. See Performance.

If a write request returns OK, data is guaranteed to be atomically and durably written to object storage. By default, writes are immediately visible to queries. You can read more about how upserts work on the Architecture page.

Each write request can have a maximum payload size of 256 MB. To maximize throughput and minimize cost, we recommend writing in large batches.

Every write must be associated with a document ID. Document IDs are unsigned 64-bit integers, 128-bit UUIDs, or strings. Mixing ID types in a namespace is not supported.

turbopuffer supports the following types of writes:

  • Upserts: creates or overwrites an entire document.
  • Patches: updates one or more attributes of an existing document.
  • Deletes: deletes an entire document by ID.
  • Delete by filter: deletes documents that match a filter.
  • Copy from namespace: copies all documents from another namespace.

Parameters

upsert_columns object

Upserts documents in a column-based format. This field is an object, where each key is the name of a column, and each value is an array of values for that column.

The id key is required, and must contain an array of document IDs.

The vector key is required if the namespace has a vector index. For non-vector namespaces, this key should be omitted. If present, it must contain an array of vectors.

Any other keys will be stored as attributes.

Each column must be the same length. When a document doesn't have a value for a given column, pass null.

Example: {"id": [1, 2], "vector": [[1, 2, 3], [4, 5, 6]], "name": ["foo", "bar"]}

Note: the v1 write API used null vectors to represent deletes. This is no longer the case in the v2 API - use the deletes field instead.


upsert_rows array

Upserts documents in a row-based format. Each row is an object with an id key, and a number of other keys.

The id key is required, and must contain a document ID.

The vector key is required if the namespace has a vector index. For non-vector namespaces, this key should be omitted. If present, it must contain a vector.

Any other keys will be stored as attributes.

Example: [{"id": 1, "vector": [1, 2, 3], "name": "foo"}, {"id": 2, "vector": [4, 5, 6], "name": "bar"}]


patch_columns object

Patches documents in a column-based format. Identical to upsert_columns, but instead of overwriting entire documents, only the specified keys are written.

The vector key currently cannot be patched.

Any patches to IDs that don't already exist in the namespace will be ignored; patches will not create any missing documents.

Example: {"id": [1, 2], "name": ["baz", "qux"]}


patch_rows array

Patches documents in a row-based format. Identical to upsert_rows, but instead of overwriting entire documents, only the specified keys are written.

The vector key currently cannot be patched.

Any patches to IDs that don't already exist in the namespace will be ignored; patches will not create any missing documents.

Example: [{"id": 1, "name": "baz"}, {"id": 2, "name": "qux"}]


deletes array

Deletes documents by ID. Must be an array of document IDs.

Example: [1, 2, 3]


delete_by_filter object

You can delete documents that match a filter using delete_by_filter. It has the same syntax as the filters parameter in the query API.

Example: ['page_id', 'Eq', '123']


distance_metric cosine_distance | euclidean_squaredrequired unless copy_from_namespace is set

The function used to calculate vector similarity. Possible values are cosine_distance or euclidean_squared.

cosine_distance is defined as 1 - cosine_similarity and ranges from 0 to 2. Lower is better.

euclidean_squared is defined as sum((x - y)^2). Lower is better.


copy_from_namespace string

Copy all documents from a namespace into this namespace. This operation is currently limited to copying within the same region and organization. The initial request currently cannot make schema changes or contain documents. Contact us if you need any of this.

Copying is billed at a 50% write discount which stacks with the up to 50% discount for batched writes. This is a faster, cheaper alternative to re-upserting documents for backups and namespaces that share documents.

Example: "source-namespace"


schema object

By default, the schema is inferred from the passed data. See Defining the Schema below for details.

There are cases where you want to manually specify the schema because turbopuffer can't automatically infer it. For example, to specify UUID types, configure full-text search for an attribute, or disable filtering for an attribute.

Example: {"permissions": "[]uuid", "text": {"type": "string", "full_text_search": true}, "encrypted_blob": {"type": "string", "filterable": false}}


encryption objectoptional

Only available as part of our enterprise offerings. Contact us.

Setting a Customer Managed Encryption Key (CMEK) will encrypt all data in a namespace using a secret coming from your cloud KMS. Once set, all subsequent writes to this namespace will be encrypted, but data written prior to this upsert will be unaffected.

Currently, turbopuffer does not re-encrypt data when you rotate key versions, meaning old data will remain encrypted using older key verisons, while fresh writes will be encrypted using the latest versions. Revoking old key versions will cause data loss. To re-encrypt your data using a more recent key, use the export API to re-upsert into a new namespace.

Example: { "cmek": { "key_name": "projects/myproject/locations/us-central1/keyRings/EXAMPLE/cryptoKeys/KEYNAME" } }

Attributes

Documents can optionally include attributes, which can be used to filter search results, for FTS indexes, or simply to store additional data.

Attributes must have consistent value types. For example, if a document is upserted containing attribute key foo with a string value, all future documents that specify foo must also use a string value (or null).

The schema is automatically inferred, but can be configured to control type and indexing behavior. Some attribute types, such as the uuid or datetime type, cannot be inferred automatically, and must be specified in the schema.

If a new attribute is added, this attribute will default to null for any documents that existed before the attribute was added. If you would like to backfill values for existing documents, use patch_columns.

By default, all attributes are indexed. To disable indexing for an attribute, set the filterable field to false in the schema, for a 50% discount and improved indexing performance.

Some limits apply to attribute sizes and number of attribute names per namespace. See Limits.

Vectors

Vectors are arrays of f32 or f16 values, and are encoded in the API as an array of JSON numbers or as a base64-encoded string.

To use f16 vectors, the vector field must be explicitly specified in the schema when first creating the namespace.

Each vector in the namespace must have the same number of dimensions.

If using the base64 encoding, the vector must be serialized in little-endian float32 or float16 binary format, then base64-encoded. The base64 string encoding can be more efficient on both the client and server.

A namespace can be created without vectors. In this case, the vector key must be omitted from all write requests.

Examples

Column-based writes

Bulk document operations should use a column-oriented layout for best performance. You can pass any combination of upsert_columns, patch_columns, and deletes to the write request.

If the same document ID appears multiple times in the request, the request will fail with an HTTP 400 error.

Row-based writes

Row-based writes may be more convenient than column-based writes. You can pass any combination of upsert_rows, patch_rows, and deletes to the write request.

If the same document ID appears multiple times in the request, the request will fail with an HTTP 400 error.

Delete by filter

To delete documents that match a filter, use delete_by_filter. This operation will return the actual number of documents removed.

Because the operation internally issues a query to determine which documents to delete, this operation is billed as both a query and a write operation.

delete_by_filter has the same syntax as the filters parameter in the query API.

Schema

The schema is optionally set on upsert to configure type and indexing behavior. By default, types are automatically inferred from the passed data and every attribute is indexed. To see what types were inferred, you can inspect the schema.

The schema documentation lists all supported attribute types and indexing options. A few examples where manually configuring the schema is needed:

  1. UUID values serialized as strings can be stored in turbopuffer in an optimized format
  2. Full-text search for a string attribute
  3. Disabling indexing/filtering (filterable:false) for an attribute, for a 50% discount and improved indexing performance.

You can choose to pass the schema on every upsert, or only the first. There's no performance difference. If a new attribute is added, this attribute will default to null for any documents that existed before the attribute was added.

An example of (1), (2), and (3) on upsert: