POST/v2/namespaces/:namespace

Creates, updates, or deletes documents.

A :namespace is an isolated set of documents and is implicitly created when the first document is inserted. Namespace names must match [A-Za-z0-9-_.]{1,128}.

We recommend creating a namespace per isolated document space instead of filtering when possible. Large batches of writes are highly encouraged to maximize throughput and minimize cost. Write requests can have a payload size of up to 512 MB. See Performance.

Within a namespace, documents are uniquely referred to by their ID. Document IDs are unsigned 64-bit integers, 128-bit UUIDs, or strings up to 64 bytes.

turbopuffer supports the following types of writes:

Request

upsert_rows
array

Upserts documents in a row-based format. Each row is an object with an id document ID, and any number of other attribute fields.

Existing documents with matching IDs are overwritten entirely. Use patch_rows to update only specific attributes.

A namespace may or may not have vector indexes. If it does, all documents must include all vector attributes.

example
[ { "id": 1, "vector": [1, 2, 3], "name": "foo" }, { "id": 2, "vector": [4, 5, 6], "name": "bar" } ]

upsert_columns
object

Upserts documents in a column-based format. This field is an object, where each key is the name of a column, and each value is an array of values for that column.

Existing documents with matching IDs are overwritten entirely. Use patch_columns to update only specific attributes.

The id key is required, and must contain an array of document IDs. All vector attribute columns are required if the namespace has vector indexes. Other keys will be stored as attributes.

Each column must be the same length. When a document doesn't have a value for a given column, pass null.

example
{ "id": [1, 2], "vector": [[1, 2, 3], [4, 5, 6]], "name": ["foo", "bar"] }

patch_rows
array

Patches documents in a row-based format. Identical to upsert_rows, but instead of overwriting entire documents, only the specified keys are written.

Vector attributes currently cannot be patched. You currently need to retrieve and upsert the entire document.

Any patches to IDs that don't already exist in the namespace will be ignored; patches will not create any missing documents.

example
[ { "id": 1, "name": "baz" }, { "id": 2, "name": "qux" } ]

Patches are billed for the size of the patched attributes (not the full written documents), plus the cost of one query per write request (to read all the patched documents touched by the request).


patch_columns
object

Patches documents in a column-based format. Identical to upsert_columns, but instead of overwriting entire documents, only the specified keys are written.

Vector attributes currently cannot be patched. You currently need to retrieve and upsert the entire document.

Any patches to IDs that don't already exist in the namespace will be ignored; patches will not create any missing documents.

example
{ "id": [1, 2], "name": ["baz", "qux"] }

deletes
array

Deletes documents by ID. Must be an array of document IDs.

example
[ 1, 2, 3 ]

upsert_condition
object

Makes each write in upsert_rows and upsert_columns conditional on the upsert_condition being satisfied for the document with the corresponding ID.

The upsert_condition is evaluated before each write, using the current value of the document with the matching ID.

  • If the document exists and the condition is met, the write is applied (i.e. the document is updated).
  • If the document exists and the condition is not met, the write is skipped.
  • If the document does not exist, the write is applied unconditionally (i.e. the document is created).

The condition syntax matches the filters parameter in the query API, with an additional feature: you can reference the new value being written using $ref_new references. These look like {"$ref_new": "attr_123"} and can be used in place of value literals.

[ "Or", [ [ "updated_at", "Lt", { "$ref_new": "updated_at" } ], ["updated_at", "Eq", null] ] ]

The newer timestamp example ensures that each upsert is only processed if the new document value has a newer updated_at timestamp than its current version.

The insert if not exists example ensures that each upsert only inserts new documents, skipping any writes where a document with that ID already exists. Since existing documents always have a non-null id, this condition fails for them, while new documents are inserted unconditionally.


patch_condition
object

Like upsert_condition, but for patch_rows and patch_columns.

Any patches to IDs that don't already exist in the namespace will be ignored without evaluating the condition; patches will not create any missing documents.

Does not apply to patch_by_filter. Prefer this over patch_by_filter when the set of IDs to conditionally patch is known ahead of time.


delete_condition
object

Like upsert_condition, but for deletes.

$ref_new references are given a null value for all attributes.

Does not apply to delete_by_filter. Prefer this over delete_by_filter when the set of IDs to conditionally delete is known ahead of time.


patch_by_filter
object

You can patch documents that match a filter using patch_by_filter. It accepts an object with two fields:

  • filters: a filter expression (see query filtering)
  • patch: an object containing the the patch to apply to all documents matching the filter

If patch_by_filter is used in the same request as other write operations, it is applied after delete_by_filter but before any other write operations.

Vector attributes currently cannot be patched. You currently need to retrieve and upsert the entire document.

example
{ "filters": [ "page_id", "Eq", 123 ], "patch": { "page_id": 124 } }

patch_by_filter is billed as a write and two queries (one for the filter, one for the patch).


delete_by_filter
object

You can delete documents that match a filter using delete_by_filter. It has the same syntax as the filters parameter in the query API.

If delete_by_filter is used in the same request as other write operations, delete_by_filter will be applied before the other operations. This allows you to delete rows that match a filter before writing new row with overlapping IDs. Note that patches to any deleted rows are ignored.

delete_by_filter is different from deletes with a delete_condition:

  • delete_by_filter: searches across the namespace for any matching document IDs, deleting all matches that it finds.
  • delete + delete_condition: only evaluates the condition on the IDs identified in deletes.

delete_condition does not apply to delete_by_filter.

example
[ "page_id", "Eq", 123 ]

delete_by_filter is billed the same as normal deletes, plus the cost of one query per write request (to determine which IDs to delete).


patch_by_filter_allow_partial
booleandefault: false

Allows patch_by_filter operations to succeed when the filter matches more than the maximum allowed number of documents.

When set to true, a patch_by_filter will update up to the maximum allowed number of documents, and set rows_remaining to true if any additional documents could match this filter. You should issue another potentially duplicate request to update additional matching documents.

When set to false, a patch_by_filter which matches more than the maximum allowed number of documents will fail and update no documents.


delete_by_filter_allow_partial
booleandefault: false

Allows delete_by_filter operations to succeed when the filter matches more than the maximum allowed number of documents.

When set to true, a delete_by_filter will delete up to the maximum allowed number of documents, and set rows_remaining to true if any additional documents could match this filter. You should issue another potentially duplicate request to delete additional matching documents.

When set to false, a delete_by_filter which matches more than the maximum allowed number of documents will fail and update no documents.


return_affected_ids
booleandefault: false

If true, the response will include upserted_ids, patched_ids, and deleted_ids arrays containing the IDs of documents that were successfully written.

For conditional writes and filter-based operations, only IDs for writes that succeeded will be included.


distance_metric
cosine_distance | euclidean_squaredrequired unless copy_from_namespace or branch_from_namespace is set or the namespace has no vector columns

The function used to calculate vector similarity. Possible values are cosine_distance or euclidean_squared.

cosine_distance is defined as 1 - cosine_similarity and ranges from 0 to 2. Lower is better.

euclidean_squared is defined as sum((x - y)^2). Lower is better.

NOTE: This distance metric will apply to all vector columns configured for this namespace.


copy_from_namespace
string | object

Copy all documents from another namespace into this one. The destination namespace you are copying into must be empty. The initial request currently cannot make schema changes or contain documents.

Copying is billed at up to a 75% write discount (a 50% copy discount that stacks with the up to 50% discount for batched writes). This is a faster, cheaper alternative to re-upserting documents for backups and namespaces that share documents. See the cross-region backups guide for an example. For same-region use cases, consider branch_from_namespace which completes instantly regardless of namespace size.

For copies from another region, the logical size copied is also billed as returned bytes. Same-region copies do not bill returned bytes.

To copy a namespace from a different organization, region, or cloud provider, instead of providing the namespace as a string, provide an object with the following fields:

  • source_namespace (string): the namespace to copy from
  • source_api_key (string, optional): an API key for the organization containing the source namespace. Omit to copy from the same organization as the target namespace.
  • source_region (string, optional): the region of the source namespace (e.g. "aws-us-east-1"). Omit to copy from the same region as the target namespace. Source and destination can be in different cloud providers (e.g. aws-us-east-1gcp-us-central1).

By default, the destination namespace will inherit the source namespace's encryption configuration. You can optionally specify a different encryption for the destination namespace. This allows you to copy from a namespace with default encryption to a namespace with customer-managed encryption, or vice-versa, or to use a different CMEK key than the source.

For cross-region copies from a namespace with customer-managed encryption, you must explicitly specify a destination encryption key available in the destination region.

"source-namespace"

Copies of large namespaces can run asynchronously. See Asynchronous requests.


branch_from_namespace
string

Creates an instant copy-on-write clone of the source namespace. The destination namespace must be empty.

After branching, both namespaces are fully independent — reads, writes, queries, and deletes on one namespace do not affect the other.

Branching is billed at a flat rate of $0.032. See the branching guide for details, examples, and guidance on when to use branching vs copy_from_namespace.

Example: "source-namespace"


schema
object

By default, the schema is inferred from the passed data. See Schema below.

There are cases where you want to manually specify the schema because turbopuffer can't automatically infer it. For example, to specify UUID types, configure full-text search for an attribute, or disable filtering for an attribute.

example
{ "permissions": "[]uuid", "text": { "type": "string", "full_text_search": true }, "encrypted_blob": { "type": "string", "filterable": false } }

encryption
objectoptional

Only available as part of our scale and enterprise plans.

Setting a Customer Managed Encryption Key (CMEK) will encrypt all data in a namespace using a secret coming from your cloud KMS. Once set, all subsequent writes to this namespace will be encrypted, but data written prior to this upsert will be unaffected.

Currently, turbopuffer does not re-encrypt data when you rotate key versions, meaning old data will remain encrypted using older key verisons, while fresh writes will be encrypted using the latest versions. Revoking old key versions will cause data loss. To re-encrypt your data using a more recent key, use the export API to re-upsert into a new namespace, or use copy_from_namespace with a different encryption key to copy to a newly encrypted namespace.

{ "mode": "customer-managed", "key_name": "projects/myproject/locations/us-central1/keyRings/EXAMPLE/cryptoKeys/KEYNAME" }

disable_backpressure
booleandefault: false

Disables HTTP 429 backpressure on writes when unindexed data exceeds 2 GiB. Useful for initial data loading or bulk updates. When disabled, strongly consistent queries return errors above this threshold, so use eventual consistency instead. Eventually consistent queries search only the first 128 MiB of unindexed data.

Only takes effect for upserts and delete-by-id. Ignored for patch-by-id, patch-by-filter, delete-by-filter, and conditional writes, since those operations require a strongly consistent read of existing rows.

Indexing progress can be tracked through the unindexed_bytes field in the metadata endpoint.

Note that while data is being indexed, the following will not be updated:

Response

rows_affected
number

The total number of rows affected by the write request (sum of upserted, patched, and deleted rows).

rows_upserted
number

The number of rows upserted by the write request. Only present when upsert_rows or upsert_columns is used.

rows_patched
number

The number of rows patched by the write request. Only present when patch_rows or patch_columns or patch_by_filter is used.

When using patch_condition, this reflects only the rows where the condition was met and the patch was applied. Other patches were skipped.

rows_deleted
number

The number of rows deleted by the write request. Only present when deletes or delete_by_filter is used.

When using delete_condition, this reflects only the rows where the condition was met and the deletion occurred. Other deletes were skipped.

rows_remaining
boolean

Filter-based writes like delete_by_filter and patch_by_filter have a maximum number of documents modified per write request. This ensures indexing and consistent reads can keep up with writes & deletes. If this response field is set to true there are more documents that match the delete_by_filter or patch_by_filter. You should issue another potentially duplicate request to update additional matching documents.

The limits are currently:

  • 5M documents for delete_by_filter
  • 50k documents for patch_by_filter
upserted_ids
array

The IDs of documents that were upserted. Only present when return_affected_ids is true and at least one document was upserted.

patched_ids
array

The IDs of documents that were patched. Only present when return_affected_ids is true and at least one document was patched.

deleted_ids
array

The IDs of documents that were deleted. Only present when return_affected_ids is true and at least one document was deleted.

billing
object

The billable resources consumed by the write. The object contains the following fields:

  • billable_logical_bytes_written (uint): the number of logical bytes written to the namespace
  • query (object, optional): query billing information when the write involves a query-like operation (for a conditional write, patch_by_filter, delete_by_filter, or a cross-region copy_from_namespace):
    • billable_logical_bytes_queried (uint): the number of logical bytes processed by queries
    • billable_logical_bytes_returned (uint): the number of logical bytes returned by queries
performance
object

The performance metrics for the write. The object currently contains the following fields, but these fields may change name, type, or meaning in the future:

  • server_total_ms (uint): request time measured on the server, in milliseconds

Attributes

Documents are composed of attributes. All documents must have a unique id attribute. Attribute names can be up to 128 characters in length and must not start with a $ character.

By default, attributes are indexed and thus queries can filter or sort by them. To disable indexing for an attribute, set filterable to false in the schema for a 50% discount and improved indexing performance. The attribute can still be returned, but not used for filtering or sorting.

Attributes must have consistent value types, and are nullable. The type is inferred from the first occurrence of the attribute. Certain non-inferrable types, e.g. uuid or datetime, must be specified in the schema.

Some limits apply to attribute sizes and number of attribute names per namespace. See Limits.

Vectors

Vectors are attributes with a vector type ([N]f32 or [N]f16 where N is the number of dimensions), encoded as either a JSON array of numbers, or as a base64-encoded string. Attributes named vector will automatically be inferred as having vector types, additional vector columns must be explicitly declared in the schema.

If using the base64 encoding, the vector must be serialized in little-endian float32 binary format, then base64-encoded. The base64 string encoding can be more efficient on both the client and server.

Elements of a vector attribute must have the same number of dimensions.

A namespace can currently be created with up to 2 vector columns. The number of vector columns cannot be changed after namespace creation.

To use f16 vectors within the database, the relevant vector attribute must be explicitly specified in the schema with an f16 type (e.g. [512]f16) when first creating the namespace. This does not affect the base64 vector encoding in the API, which always uses a little-endian float32 binary format.

Vector attributes require an ANN index, configured via the ann schema parameter.

Multiple vector columns

A namespace can have multiple vector columns, each with independent dimensions and types. Vector columns must be declared in the schema and are fixed at namespace creation time.

Pricing: Each vector column has its own ANN index. Filterable attributes are indexed per ANN index, so their write and storage costs scale with the number of vector columns. Non-filterable attributes are stored once regardless of the number of vector columns. See attribute billing for details.

import turbopuffer

tpuf = turbopuffer.Turbopuffer(
    region='gcp-us-central1', # choose best region: https://turbopuffer.com/docs/regions
)

ns = tpuf.namespace(f'write-multivec-example-py')

ns.write(
    upsert_rows=[
        {
            'id': 1,
            'title_embedding': [0.1, 0.2, 0.3],
            'image_embedding': [0.4, 0.5],
            'title': 'hello world',
        },
        {
            'id': 2,
            'title_embedding': [0.4, 0.5, 0.6],
            'image_embedding': [0.7, 0.8],
            'title': 'goodbye world',
        },
    ],
    distance_metric='cosine_distance',
    schema={
        'title_embedding': {
            'type': '[3]f32',
            'ann': True,
        },
        'image_embedding': {
            'type': '[2]f16',
            'ann': True,
        },
    },
)

Schema

turbopuffer maintains a schema for each namespace with type and indexing behaviour for each attribute. By default, types are automatically inferred from the passed data and every attribute is indexed. To inspect the schema, use the metadata endpoint.

To customize indexing behavior or to specify types that cannot be automatically inferred (e.g. uuid), you can pass a schema object in a write request. This can be done on every write, or only the first; there's no performance difference. If a new attribute is added, this attribute will default to null for any documents that existed before the attribute was added.

Changing the attribute type of an existing attribute is currently an error.

For an example, see Configuring the schema.

type
stringrequired true

The data type of the attribute. Supported types:

  • string: String
  • int: Signed integer (i64)
  • uint: Unsigned integer (u64)
  • float: Floating-point number (f64)
  • uuid: 128-bit UUID
  • datetime: Date and time
  • bool: Boolean
  • []string: Array of strings
  • []int: Array of signed integers
  • []uint: Array of unsigned integers
  • []float: Array of floating-point numbers
  • []uuid: Array of UUIDs
  • []datetime: Array of dates and times
  • []bool: Array of booleans
  • [N]f32: N dimensional f32 vector
  • [N]f16: N dimensional f16 vector
  • {}f16: Sparse vector with string keys and 16-bit floats as weights. Example: {"dim0": 0.1, "dim1": 0.2}.

All attributes are nullable, except for id.

string, int and bool types and their array variants can be inferred from the write payload. Other types, such as uint or uuid must be set explicitly in the schema. See UUID values for an example.

datetime values should be provided as an ISO 8601 formatted string with a mandatory date and optional time and time zone. Internally, these values are converted to UTC (if the time zone is specified) and stored as a 64-bit integer representing milliseconds since the epoch.

example
[ "2015-01-20", "2015-01-20T12:34:56", "2015-01-20T12:34:56-04:00" ]

{}f16 attributes are not filterable and require indexing for fast SparseKNN operations.


ann
booleanrequired true for vector types

Must be set to true for vector type attributes ([N]f32, [N]f16). Builds an approximate nearest neighbor index for the vector column, enabling fast vector queries via rank_by.

example
{ "my_vector": { "type": "[512]f16", "ann": true } }

filterable
booleandefault: true (false if full-text search or regex is enabled)

Whether or not the attribute can be used in filters/WHERE clauses. Filtered attributes are indexed into an inverted index. At query-time, the filter evaluation is recall-aware when used for vector queries.

Unfiltered attributes don't have an index built for them, and are thus billed at a 50% discount (see pricing).


regex
booleandefault: false

Whether to enable Regex filters on this attribute. If set, filterable defaults to false; you can override this by setting filterable: true.


glob
booleandefault: false

Whether to enable Glob filters on this attribute. If set, filterable defaults to false; you can override this by setting filterable: true.


fuzzy
booleandefault: false

Whether to enable Fuzzy filters on this attribute. If set, filterable defaults to false; you can override this by setting filterable: true.



sparse_knn
objectdefault: unset

When configured, this attribute can be used as part of a SparseKNN query. This is only supported on the {}f16 type.

This requires a distance_metric string field, which only supports dot_product as a value at the moment.

Updating attributes

We support online, in-place changes of the following schema attributes:

  • filterable
  • full_text_search
  • regex
  • glob
  • fuzzy

The write does not need to include any documents, i.e. {"schema": ...} is supported, provided the namespace already exists.

Other index settings changes, attribute type changes, and attribute deletions currently cannot be done in-place. Consider exporting documents and upserting into a new namespace if you require a schema change.

After enabling the filterable, full_text_search, regex, glob, or fuzzy setting for an existing attribute, the index needs time to build before queries that depend on the index can be executed. turbopuffer will respond with HTTP status 202 to queries that depend on an index that is not yet built.

Changing full-text search parameters also requires that the index be rebuilt. turbopuffer will do this automatically in the background, during which time queries will continue returning results using the previous full-text search settings.

Billing

An unindexed attribute is billed at 50% of its logical size. Indexed attributes are based on their logical size multiplied by the number of indexes they have enabled. For example, an attribute with with filterable: true and full_text_search: true is billed at 200% of its logical size.

Examples

Row-based writes

Row-based writes may be more convenient than column-based writes. You can pass any combination of upsert_rows, patch_rows, patch_by_filter, deletes, and delete_by_filter to the write request.

If the same document ID appears multiple times in the request, the request will fail with an HTTP 400 error.

import turbopuffer

tpuf = turbopuffer.Turbopuffer(
    region='gcp-us-central1', # choose best region: https://turbopuffer.com/docs/regions
)

ns = tpuf.namespace(f'write-upsert-row-example-py')
# If an error occurs, this call raises a turbopuffer.APIError if a retry was not successful.
ns.write(
    upsert_rows=[
        {
            'id': 1,
            'vector': [0.1, 0.1],
            'my-string': 'one',
            'my-uint': 12,
            'my-bool': True,
            'my-string-array': ['a', 'b']
        },
        {
            'id': 2,
            'vector': [0.2, 0.2],
            'my-string-array': ['b', 'd']
        },
    ],
    patch_rows=[
        {
            'id': 3,
            'my-bool': True
        },
    ],
    deletes=[4],
    distance_metric='cosine_distance'
)

Configuring the schema

The schema can be passed on writes to manually configure attribute types and indexing behavior. A few examples where manually configuring the schema is needed:

  • UUID values serialized as strings can be stored in turbopuffer in an optimized format.
  • Enabling full-text search, regex, glob, or fuzzy indexing for string attributes.
  • Disabling indexing/filtering (filterable:false) on an attribute, for a 50% discount and improved indexing performance.

An example of (1), (2), and (3):

import turbopuffer

tpuf = turbopuffer.Turbopuffer(
    region='gcp-us-central1', # choose best region: https://turbopuffer.com/docs/regions
)

ns = tpuf.namespace(f'write-schema-example-py')

ns.write(
    upsert_rows=[
        {
            'id': "769c134d-07b8-4225-954a-b6cc5ffc320c",
            'vector': [0.1, 0.1],
            'text': 'the fox is quick and brown',
            'string': 'fox',
            'permissions': ['ee1f7c89-a3aa-43c1-8941-c987ee03e7bc', '95cdf8be-98a9-4061-8eeb-2702b6bbcb9e']
        },
    ],
    distance_metric='cosine_distance',
    schema={
        'id': 'uuid',
        'text': {
            'type': 'string',
            'full_text_search': True # sets filterable: false, and enables FTS with default settings
        },
        'permissions': {
            'type': '[]uuid', # otherwise inferred as slower/more expensive []string
        }
    }
)

Column-based writes

Bulk document operations should use a column-oriented layout for best performance. You can pass any combination of upsert_columns, patch_columns, deletes, and delete_by_filter to the write request.

If the same document ID appears multiple times in the request, the request will fail with an HTTP 400 error.

import turbopuffer

tpuf = turbopuffer.Turbopuffer(
    region='gcp-us-central1', # choose best region: https://turbopuffer.com/docs/regions
)

ns = tpuf.namespace(f'write-upsert-columns-example-py')
# If an error occurs, this call raises a turbopuffer.APIError if a retry was not successful.
ns.write(
    upsert_columns={
        'id': [1, 2, 3, 4],
        'vector': [[0.1, 0.1], [0.2, 0.2], [0.3, 0.3], [0.4, 0.4]],
        'my-string': ['one', None, 'three', 'four'],
        'my-uint': [12, None, 84, 39],
        'my-bool': [True, None, False, True],
        'my-string-array': [['a', 'b'], ['b', 'd'], [], ['c']]
    },
    patch_columns={
        'id': [5, 6],
        'my-bool': [True, False],
    },
    deletes=[7, 8],
    distance_metric='cosine_distance'
)

Conditional writes

To make writes conditional, use the upsert_condition, patch_condition, and delete_condition parameters. These let you specify a condition that must be satisfied for the write operation to each document to proceed.

Conditional deletes are distinct from delete_by_filter, which should be used when the set of IDs to conditionally delete is not known ahead of time.

Conditions are evaluated before each write, using the current value of the document with the matching ID.

  • If the document exists and the condition is met, the write is applied.
  • If the document exists and the condition is not met, the write is skipped.
  • If the document does not exist, the write is applied unconditionally for upserts and skipped unconditionally for patches and deletes.

The operation will return the actual number of documents written (upserted, patched, or deleted).

Internally, the operation performs a query (one per batch) to determine which documents match the condition, so it is billed as both a query and a write operation. However, if the condition is not met for a given document, that write is skipped and not billed.

The condition syntax matches the filters parameter in the query API, with an additional feature: you can reference the new value being written using $ref_new references. These look like {"$ref_new": "attr_123"} and can be used in place of value literals. This allows the condition to vary by document in a multi-document write request.

Two common patterns:

  • Version checks: Set upsert_condition to ["version", "Lt", {"$ref_new": "version"}] to only apply writes when the new document has a higher version than the existing one.
  • Insert if not exists: Set upsert_condition to ["id", "Eq", null] to only insert documents that don't already exist. Since existing documents always have a non-null id, this condition fails for existing documents (skipping the write), while new documents are inserted unconditionally.
import turbopuffer

tpuf = turbopuffer.Turbopuffer(
    region='gcp-us-central1', # choose best region: https://turbopuffer.com/docs/regions
)

ns = tpuf.namespace(f'write-conditional-example-py')

ns.write(
    upsert_rows=[
        {
            'id': 101,
            'vector': [0.2, 0.8],
            'title': 'LISP Guide for Beginners (draft_v2)',
            'version': 2
        },
        {
            'id': 102,
            'vector': [0.4, 0.4],
            'title': 'AI for Practitioners (final)',
            'version': 5
        }
    ],
    distance_metric='cosine_distance'
)

# Conditionally upsert documents with news title, making sure no version
# regression occurs.
result = ns.write(
    upsert_rows=[
        {
            'id': 101,
            'vector': [0.2, 0.8],
            'title': 'LISP Guide for Beginners (final)',
            'version': 3
        },
        {
            'id': 102,
            'vector': [0.4, 0.4],
            'title': 'AI for Practitioners (draft_v4)',
            'version': 4
        },
        {
            'id': 103,
            'vector': [0.6, 0.8],
            'title': 'Database Internals (draft_v1)',
            'version': 1
        }
    ],
    upsert_condition=('version', 'Lt', {'$ref_new': 'version'}),
    distance_metric='cosine_distance'
)
print(result.rows_affected) # 2

results = ns.query(limit=10, include_attributes=True)
print(results.rows)

Delete by filter

To delete documents that match a filter, use delete_by_filter. This operation will return the actual number of documents removed.

Because the operation internally issues a query to determine which documents to delete, this operation is billed as both a query and a write operation.

If delete_by_filter is used in the same request as other write operations, delete_by_filter will be applied before the other operations. This allows you to delete rows that match a filter before writing new row with overlapping IDs. Note that patches to any deleted rows are ignored.

delete_by_filter has the same syntax as the filters parameter in the query API.

import turbopuffer

tpuf = turbopuffer.Turbopuffer(
    region="gcp-us-central1", # choose best region: https://turbopuffer.com/docs/regions
)

ns = tpuf.namespace(f'write-delete-by-filter-example-py')

ns.write(
    upsert_rows=[
        {
            "id": 101,
            "vector": [0.2, 0.8],
            "title": "LISP Guide for Beginners",
            "views": 10,
        },
        {
            "id": 102,
            "vector": [0.4, 0.4],
            "title": "AI for Practitioners",
            "views": 2500,
        },
    ],
    distance_metric="cosine_distance",
)

# Delete posts with titles that include the word "guide"
# and have 1000 or less views
result = ns.write(
    delete_by_filter=("And", [("title", "IGlob", "*guide*"), ("views", "Lte", 1000)])
)
print(result.rows_affected)  # 1

results = ns.query(rank_by=("id", "asc"), limit=10)
print(len(results.rows))  # 1

Patch by filter

To patch a dynamically determined set of documents, use patch_by_filter. This operation will return the actual number of documents updated. When rows_remaining is set to true in the response, there may be more documents matching your filter that can be patched, issue a follow up request to patch those documents.

Because this operation uses a query to determine which rows need to be patched, it will be billed as a query & a patch operation (1 write, 2 queries total).

If patch_by_filter is used in the same request as other write operations, it is applied after delete_by_filter but before any other write operations. patch_by_filter will not apply to any rows which were deleted.

import turbopuffer

tpuf = turbopuffer.Turbopuffer(
    region="gcp-us-central1", # choose best region: https://turbopuffer.com/docs/regions
)

ns = tpuf.namespace(f'write-patch-by-filter-example-py')

ns.write(
    upsert_rows=[
        {
            "id": 101,
            "vector": [0.2, 0.8],
            "title": "LISP Guide for Beginners",
            "views": 10,
            "status": "published",
        },
        {
            "id": 102,
            "vector": [0.4, 0.4],
            "title": "AI for Practitioners",
            "views": 2500,
            "status": "published",
        },
        {
            "id": 103,
            "vector": [0.6, 0.3],
            "title": "Rust Basics",
            "views": 50,
            "status": "published",
        },
    ],
    distance_metric="cosine_distance",
)

# Archive posts that are published and have 100 or fewer views
result = ns.write(
    patch_by_filter={
        "filters": ("And", [("status", "Eq", "published"), ("views", "Lte", 100)]),
        "patch": {"status": "archived"},
    }
)
print(result.rows_affected)  # 2

results = ns.query(rank_by=("id", "asc"), include_attributes=["status"], limit=10)
for row in results.rows:
    print(f"ID {row['id']}: {row['status']}")  # IDs 101 and 103 are now archived