POST/v2/namespaces/:namespace
Creates, updates, or deletes documents.
A :namespace is an isolated set of documents and is implicitly created when
the first document is inserted. Namespace names must match [A-Za-z0-9-_.]{1,128}.
We recommend creating a namespace per isolated document space instead of filtering when possible. Large batches of writes are highly encouraged to maximize throughput and minimize cost. Write requests can have a payload size of up to 512 MB. See Performance.
Within a namespace, documents are uniquely referred to by their ID. Document IDs are unsigned 64-bit integers, 128-bit UUIDs, or strings up to 64 bytes.
turbopuffer supports the following types of writes:
- Upserts: creates or overwrites an entire document.
- Patches: updates one or more attributes of an existing document.
- Deletes: deletes an entire document by ID.
- Conditional writes: upsert, patch, or delete a document only if a condition.
- Patch by filter: patches documents that match a filter.
- Delete by filter: deletes documents that match a filter.
- Copy from namespace: copies all documents from another namespace.
- Branch from namespace: instantly creates a copy-on-write clone of a namespace.
Request
Upserts documents in a row-based format. Each row is an object with an id document ID,
and any number of other attribute fields.
Existing documents with matching IDs are overwritten entirely. Use patch_rows to update only specific attributes.
A namespace may or may not have vector indexes. If it does, all documents must include all vector attributes.
[
{
"id": 1,
"vector": [1, 2, 3],
"name": "foo"
},
{
"id": 2,
"vector": [4, 5, 6],
"name": "bar"
}
]Upserts documents in a column-based format. This field is an object, where each key is the name of a column, and each value is an array of values for that column.
Existing documents with matching IDs are overwritten entirely. Use patch_columns to update only specific attributes.
The id key is required, and must contain an array of document IDs. All vector attribute columns are required if
the namespace has vector indexes. Other keys will be stored as attributes.
Each column must be the same length. When a document doesn't have a value for a
given column, pass null.
{
"id": [1, 2],
"vector": [[1, 2, 3], [4, 5, 6]],
"name": ["foo", "bar"]
}Patches documents in a row-based format. Identical to
upsert_rows, but instead of overwriting entire
documents, only the specified keys are written.
Vector attributes currently cannot be patched. You currently need to retrieve and upsert the entire document.
Any patches to IDs that don't already exist in the namespace will be ignored; patches will not create any missing documents.
[
{
"id": 1,
"name": "baz"
},
{
"id": 2,
"name": "qux"
}
]Patches are billed for the size of the patched attributes (not the full written documents), plus the cost of one query per write request (to read all the patched documents touched by the request).
Patches documents in a column-based format. Identical to
upsert_columns, but instead of overwriting entire
documents, only the specified keys are written.
Vector attributes currently cannot be patched. You currently need to retrieve and upsert the entire document.
Any patches to IDs that don't already exist in the namespace will be ignored; patches will not create any missing documents.
{
"id": [1, 2],
"name": ["baz", "qux"]
}Deletes documents by ID. Must be an array of document IDs.
[
1,
2,
3
]Makes each write in upsert_rows and
upsert_columns conditional on the
upsert_condition being satisfied for the document with the corresponding ID.
The upsert_condition is evaluated before each write, using the current value
of the document with the matching ID.
- If the document exists and the condition is met, the write is applied (i.e. the document is updated).
- If the document exists and the condition is not met, the write is skipped.
- If the document does not exist, the write is applied unconditionally (i.e. the document is created).
The condition syntax matches the filters parameter in the query
API, with an additional feature: you can reference the new value
being written using $ref_new references. These look like {"$ref_new": "attr_123"}
and can be used in place of value literals.
[
"Or",
[
[
"updated_at",
"Lt",
{
"$ref_new": "updated_at"
}
],
["updated_at", "Eq", null]
]
]The newer timestamp example ensures that each upsert is only processed if the
new document value has a newer updated_at timestamp than its current version.
The insert if not exists example ensures that each upsert only inserts new
documents, skipping any writes where a document with that ID already exists.
Since existing documents always have a non-null id, this condition fails for
them, while new documents are inserted unconditionally.
Like upsert_condition, but for patch_rows and
patch_columns.
Any patches to IDs that don't already exist in the namespace will be ignored without evaluating the condition; patches will not create any missing documents.
Does not apply to patch_by_filter. Prefer this over patch_by_filter when
the set of IDs to conditionally patch is known ahead of time.
Like upsert_condition, but for deletes.
$ref_new references are given a null value for all attributes.
Does not apply to delete_by_filter. Prefer this over delete_by_filter when
the set of IDs to conditionally delete is known ahead of time.
You can patch documents that match a filter using patch_by_filter.
It accepts an object with two fields:
filters: a filter expression (see query filtering)patch: an object containing the the patch to apply to all documents matching the filter
If patch_by_filter is used in the same request as other write operations, it is applied after delete_by_filter but before any other write operations.
Vector attributes currently cannot be patched. You currently need to retrieve and upsert the entire document.
{
"filters": [
"page_id",
"Eq",
123
],
"patch": {
"page_id": 124
}
}patch_by_filter is billed as a write and two queries (one for the filter, one for the patch).
You can delete documents that match a filter using delete_by_filter.
It has the same syntax as the filters parameter in the query API.
If delete_by_filter is used in the same request as other write operations,
delete_by_filter will be applied before the other operations. This allows you
to delete rows that match a filter before writing new row with overlapping IDs.
Note that patches to any deleted rows are ignored.
delete_by_filter is different from deletes with a delete_condition:
delete_by_filter: searches across the namespace for any matching document IDs, deleting all matches that it finds.delete+delete_condition: only evaluates the condition on the IDs identified indeletes.
delete_condition does not apply to delete_by_filter.
[
"page_id",
"Eq",
123
]delete_by_filter is billed the same as normal deletes, plus the cost of one
query per write request (to determine which IDs to delete).
Allows patch_by_filter operations to succeed when the filter matches more than the maximum allowed number of documents.
When set to true, a patch_by_filter will update up to the maximum allowed number of documents, and set rows_remaining to true if any additional documents could match this filter. You should issue another potentially duplicate request to
update additional matching documents.
When set to false, a patch_by_filter which matches more than the maximum allowed number of documents will fail and update no documents.
Allows delete_by_filter operations to succeed when the filter matches more than the maximum allowed number of documents.
When set to true, a delete_by_filter will delete up to the maximum allowed number of documents, and set rows_remaining to true if any additional documents could match this filter. You should issue another potentially duplicate request to
delete additional matching documents.
When set to false, a delete_by_filter which matches more than the maximum allowed number of documents will fail and update no documents.
If true, the response will include upserted_ids, patched_ids, and
deleted_ids arrays containing the IDs of documents that were successfully
written.
For conditional writes and filter-based operations, only IDs for writes that succeeded will be included.
The function used to calculate vector similarity. Possible values are cosine_distance or euclidean_squared.
cosine_distance is defined as 1 - cosine_similarity and ranges from 0 to 2.
Lower is better.
euclidean_squared is defined as sum((x - y)^2). Lower is better.
NOTE: This distance metric will apply to all vector columns configured for this namespace.
Copy all documents from another namespace into this one. The destination namespace you are copying into must be empty. The initial request currently cannot make schema changes or contain documents.
Copying is billed at up to a 75% write discount (a 50% copy discount that stacks
with the up to 50% discount for batched writes). This is a faster, cheaper
alternative to re-upserting documents for backups and namespaces that share
documents. See the cross-region backups guide for an example.
For same-region use cases, consider branch_from_namespace
which completes instantly regardless of namespace size.
For copies from another region, the logical size copied is also billed as returned bytes. Same-region copies do not bill returned bytes.
To copy a namespace from a different organization, region, or cloud provider, instead of providing the namespace as a string, provide an object with the following fields:
source_namespace(string): the namespace to copy fromsource_api_key(string, optional): an API key for the organization containing the source namespace. Omit to copy from the same organization as the target namespace.source_region(string, optional): the region of the source namespace (e.g."aws-us-east-1"). Omit to copy from the same region as the target namespace. Source and destination can be in different cloud providers (e.g.aws-us-east-1→gcp-us-central1).
By default, the destination namespace will inherit the source namespace's encryption configuration. You can optionally specify a different encryption for the destination namespace. This allows you to copy from a namespace with default encryption to a namespace with customer-managed encryption, or vice-versa, or to use a different CMEK key than the source.
For cross-region copies from a namespace with customer-managed encryption, you must explicitly specify a destination encryption key available in the destination region.
"source-namespace"Copies of large namespaces can run asynchronously. See Asynchronous requests.
Creates an instant copy-on-write clone of the source namespace. The destination namespace must be empty.
After branching, both namespaces are fully independent — reads, writes, queries, and deletes on one namespace do not affect the other.
Branching is billed at a flat rate of $0.032. See the branching
guide for details, examples, and guidance on when to use
branching vs copy_from_namespace.
Example: "source-namespace"
By default, the schema is inferred from the passed data. See Schema below.
There are cases where you want to manually specify the schema because turbopuffer can't automatically infer it. For example, to specify UUID types, configure full-text search for an attribute, or disable filtering for an attribute.
{
"permissions": "[]uuid",
"text": {
"type": "string",
"full_text_search": true
},
"encrypted_blob": {
"type": "string",
"filterable": false
}
}Only available as part of our scale and enterprise plans.
Setting a Customer Managed Encryption Key (CMEK) will encrypt all data in a namespace using a secret coming from your cloud KMS. Once set, all subsequent writes to this namespace will be encrypted, but data written prior to this upsert will be unaffected.
Currently, turbopuffer does not re-encrypt data when you rotate key versions, meaning old data will remain encrypted using older key verisons, while fresh writes will be encrypted using the latest versions.
Revoking old key versions will cause data loss.
To re-encrypt your data using a more recent key, use the export API to re-upsert into a new namespace,
or use copy_from_namespace with a different encryption key to copy to a newly encrypted namespace.
{
"mode": "customer-managed",
"key_name": "projects/myproject/locations/us-central1/keyRings/EXAMPLE/cryptoKeys/KEYNAME"
}Disables HTTP 429 backpressure on writes when unindexed data exceeds 2 GiB. Useful for initial data loading or bulk updates. When disabled, strongly consistent queries return errors above this threshold, so use eventual consistency instead. Eventually consistent queries search only the first 128 MiB of unindexed data.
Only takes effect for upserts and delete-by-id. Ignored for patch-by-id, patch-by-filter, delete-by-filter, and conditional writes, since those operations require a strongly consistent read of existing rows.
Indexing progress can be tracked through the unindexed_bytes field in the metadata endpoint.
Note that while data is being indexed, the following will not be updated:
approx_row_countandapprox_logical_bytesin the metadata endpoint- Namespace row counts and sizes in the dashboard
Response
The total number of rows affected by the write request (sum of upserted, patched, and deleted rows).
The number of rows upserted by the write request. Only present when upsert_rows or upsert_columns is used.
The number of rows patched by the write request. Only present when patch_rows or patch_columns or patch_by_filter is used.
When using patch_condition, this reflects only the rows where the condition was met and the patch was applied. Other patches were skipped.
The number of rows deleted by the write request. Only present when deletes or delete_by_filter is used.
When using delete_condition, this reflects only the rows where the condition was met and the deletion occurred. Other deletes were skipped.
Filter-based writes like delete_by_filter and patch_by_filter have a maximum
number of documents modified per write request. This ensures indexing and
consistent reads can keep up with writes & deletes. If this response field is
set to true there are more documents that match the delete_by_filter or
patch_by_filter. You should issue another potentially duplicate request to
update additional matching documents.
The limits are currently:
- 5M documents for
delete_by_filter - 50k documents for
patch_by_filter
The IDs of documents that were upserted. Only present when return_affected_ids
is true and at least one document was upserted.
The IDs of documents that were patched. Only present when return_affected_ids
is true and at least one document was patched.
The IDs of documents that were deleted. Only present when return_affected_ids
is true and at least one document was deleted.
The billable resources consumed by the write. The object contains the following fields:
billable_logical_bytes_written(uint): the number of logical bytes written to the namespacequery(object, optional): query billing information when the write involves a query-like operation (for a conditional write,patch_by_filter,delete_by_filter, or a cross-regioncopy_from_namespace):billable_logical_bytes_queried(uint): the number of logical bytes processed by queriesbillable_logical_bytes_returned(uint): the number of logical bytes returned by queries
The performance metrics for the write. The object currently contains the following fields, but these fields may change name, type, or meaning in the future:
server_total_ms(uint): request time measured on the server, in milliseconds
Attributes
Documents are composed of attributes. All documents must have a unique id attribute. Attribute names
can be up to 128 characters in length and must not start with a $ character.
By default, attributes are indexed and thus queries can filter or
sort by them. To disable indexing for an attribute, set
filterable to false in the schema for a 50% discount and
improved indexing performance. The attribute can still be returned, but not used for filtering or sorting.
Attributes must have consistent value types, and are nullable. The type is inferred from the first
occurrence of the attribute. Certain non-inferrable types, e.g. uuid or datetime, must be
specified in the schema.
Some limits apply to attribute sizes and number of attribute names per namespace. See Limits.
Vectors
Vectors are attributes with a vector type ([N]f32 or [N]f16 where N is the number of dimensions), encoded as either a JSON array of numbers, or as a base64-encoded string.
Attributes named vector will automatically be inferred as having vector types, additional vector columns must be explicitly declared in the schema.
If using the base64 encoding, the vector must be serialized in little-endian float32 binary format, then base64-encoded. The base64 string encoding can be more efficient on both the client and server.
Elements of a vector attribute must have the same number of dimensions.
A namespace can currently be created with up to 2 vector columns. The number of vector columns cannot be changed after namespace creation.
To use f16 vectors within the database, the relevant vector attribute must be explicitly
specified in the schema with an f16 type (e.g. [512]f16) when first creating the
namespace. This does not affect the base64 vector encoding in the API, which
always uses a little-endian float32 binary format.
Vector attributes require an ANN index, configured via the ann schema parameter.
Multiple vector columns
A namespace can have multiple vector columns, each with independent dimensions and types. Vector columns must be declared in the schema and are fixed at namespace creation time.
Pricing: Each vector column has its own ANN index. Filterable attributes are indexed per ANN index, so their write and storage costs scale with the number of vector columns. Non-filterable attributes are stored once regardless of the number of vector columns. See attribute billing for details.
import turbopuffer
tpuf = turbopuffer.Turbopuffer(
region='gcp-us-central1', # choose best region: https://turbopuffer.com/docs/regions
)
ns = tpuf.namespace(f'write-multivec-example-py')
ns.write(
upsert_rows=[
{
'id': 1,
'title_embedding': [0.1, 0.2, 0.3],
'image_embedding': [0.4, 0.5],
'title': 'hello world',
},
{
'id': 2,
'title_embedding': [0.4, 0.5, 0.6],
'image_embedding': [0.7, 0.8],
'title': 'goodbye world',
},
],
distance_metric='cosine_distance',
schema={
'title_embedding': {
'type': '[3]f32',
'ann': True,
},
'image_embedding': {
'type': '[2]f16',
'ann': True,
},
},
)Schema
turbopuffer maintains a schema for each namespace with type and indexing behaviour for each attribute. By default, types are automatically inferred from the passed data and every attribute is indexed. To inspect the schema, use the metadata endpoint.
To customize indexing behavior or to specify types that cannot be automatically inferred (e.g. uuid), you can pass a schema object in a write request. This can be done on every write, or only the first; there's no performance difference. If a new attribute is added, this attribute will default to null for any documents that existed before the attribute was added.
Changing the attribute type of an existing attribute is currently an error.
For an example, see Configuring the schema.
The data type of the attribute. Supported types:
string: Stringint: Signed integer (i64)uint: Unsigned integer (u64)float: Floating-point number (f64)uuid: 128-bit UUIDdatetime: Date and timebool: Boolean[]string: Array of strings[]int: Array of signed integers[]uint: Array of unsigned integers[]float: Array of floating-point numbers[]uuid: Array of UUIDs[]datetime: Array of dates and times[]bool: Array of booleans[N]f32:Ndimensional f32 vector[N]f16:Ndimensional f16 vector{}f16: Sparse vector with string keys and 16-bit floats as weights. Example:{"dim0": 0.1, "dim1": 0.2}.
All attributes are nullable, except for id.
string, int and bool types and their array variants can be inferred from
the write payload. Other types, such as uint or uuid must be set explicitly in the schema. See UUID
values for an example.
datetime values should be provided as an ISO 8601 formatted string with a
mandatory date and optional time and time zone. Internally, these values are
converted to UTC (if the time zone is specified) and stored as a 64-bit integer
representing milliseconds since the epoch.
[
"2015-01-20",
"2015-01-20T12:34:56",
"2015-01-20T12:34:56-04:00"
]{}f16 attributes are not filterable and require indexing for fast SparseKNN operations.
Must be set to true for vector type attributes ([N]f32, [N]f16). Builds an approximate nearest neighbor index for the vector column, enabling fast vector queries via rank_by.
{
"my_vector": {
"type": "[512]f16",
"ann": true
}
}Whether or not the attribute can be used in filters/WHERE clauses. Filtered attributes are indexed into an inverted index. At query-time, the filter evaluation is recall-aware when used for vector queries.
Unfiltered attributes don't have an index built for them, and are thus billed at a 50% discount (see pricing).
Whether to enable Regex filters on this attribute. If set, filterable defaults to false; you can override this by setting filterable: true.
Whether to enable Glob filters on this attribute. If set, filterable defaults to false; you can override this by setting filterable: true.
Whether to enable Fuzzy filters on this attribute. If set, filterable defaults to false; you can override this by setting filterable: true.
Whether this attribute can be used as part of a BM25 full-text
search. Requires the string or []string type,
and by default, BM25-enabled attributes are not filterable. You can
override this by setting filterable: true.
Can either be a boolean for default settings, or an object with the following optional fields:
tokenizer(string): How to convert the text to a list of tokens. Defaults toword_v3. The default is periodically upgraded for new namespaces. See: Supported tokenizerscase_sensitive(boolean): Whether searching is case-sensitive. Defaults tofalse(i.e. case-insensitive).language(string): The language of the text. Defaults toenglish. See: Supported languagesstemming(boolean): Language-specific stemming for the text. Defaults tofalse(i.e. do not stem).remove_stopwords(boolean): Removes common words from the text based onlanguage. Defaults tofalse(i.e. keep common words).ascii_folding(boolean): Whether to convert each non-ASCII character in a token to its ASCII equivalent, if one exists (e.g., à -> a). Applied after stemming and stopword removal. Defaults tofalse(i.e., no folding).max_token_length(int): Maximum length of a token in bytes. Tokens larger than this value during tokenization will be filtered out. Has to be between1and254(inclusive). Defaults to39.k1(float): Term frequency saturation parameter for BM25 scoring. Must be greater than zero. Defaults to1.2. See: Advanced tuningb(float): Document length normalization parameter for BM25 scoring. Must be in the range[0.0, 1.0]. Defaults to0.75. See: Advanced tuningk3(float): Query term frequency saturation parameter for BM25 scoring. Must be greater than zero. Defaults to8.0. See: Advanced tuning
If you require other types of full-text search options, please contact us.
When configured, this attribute can be used as part of a SparseKNN query.
This is only supported on the {}f16 type.
This requires a distance_metric string field, which only supports
dot_product as a value at the moment.
Updating attributes
We support online, in-place changes of the following schema attributes:
filterablefull_text_searchregexglobfuzzy
The write does not need to include any documents, i.e. {"schema": ...} is supported, provided the namespace already exists.
Other index settings changes, attribute type changes, and attribute deletions currently cannot be done in-place. Consider exporting documents and upserting into a new namespace if you require a schema change.
After enabling the filterable, full_text_search, regex, glob, or fuzzy setting for an existing attribute, the index needs time to build before queries that depend on the index can be executed. turbopuffer will respond with HTTP status 202 to queries that depend on an index that is not yet built.
Changing full-text search parameters also requires that the index be rebuilt. turbopuffer will do this automatically in the background, during which time queries will continue returning results using the previous full-text search settings.
Billing
An unindexed attribute is billed at 50% of its logical size. Indexed attributes are based on their logical
size multiplied by the number of indexes they have enabled. For example, an attribute with with filterable: true
and full_text_search: true is billed at 200% of its logical size.
Examples
Row-based writes
Row-based writes may be more convenient than column-based writes. You can pass
any combination of upsert_rows, patch_rows, patch_by_filter, deletes, and
delete_by_filter to the write request.
If the same document ID appears multiple times in the request, the request will fail with an HTTP 400 error.
import turbopuffer
tpuf = turbopuffer.Turbopuffer(
region='gcp-us-central1', # choose best region: https://turbopuffer.com/docs/regions
)
ns = tpuf.namespace(f'write-upsert-row-example-py')
# If an error occurs, this call raises a turbopuffer.APIError if a retry was not successful.
ns.write(
upsert_rows=[
{
'id': 1,
'vector': [0.1, 0.1],
'my-string': 'one',
'my-uint': 12,
'my-bool': True,
'my-string-array': ['a', 'b']
},
{
'id': 2,
'vector': [0.2, 0.2],
'my-string-array': ['b', 'd']
},
],
patch_rows=[
{
'id': 3,
'my-bool': True
},
],
deletes=[4],
distance_metric='cosine_distance'
)Configuring the schema
The schema can be passed on writes to manually configure attribute types and indexing behavior. A few examples where manually configuring the schema is needed:
- UUID values serialized as strings can be stored in turbopuffer in an optimized format.
- Enabling full-text search, regex, glob, or fuzzy indexing for string attributes.
- Disabling indexing/filtering (
filterable:false) on an attribute, for a 50% discount and improved indexing performance.
An example of (1), (2), and (3):
import turbopuffer
tpuf = turbopuffer.Turbopuffer(
region='gcp-us-central1', # choose best region: https://turbopuffer.com/docs/regions
)
ns = tpuf.namespace(f'write-schema-example-py')
ns.write(
upsert_rows=[
{
'id': "769c134d-07b8-4225-954a-b6cc5ffc320c",
'vector': [0.1, 0.1],
'text': 'the fox is quick and brown',
'string': 'fox',
'permissions': ['ee1f7c89-a3aa-43c1-8941-c987ee03e7bc', '95cdf8be-98a9-4061-8eeb-2702b6bbcb9e']
},
],
distance_metric='cosine_distance',
schema={
'id': 'uuid',
'text': {
'type': 'string',
'full_text_search': True # sets filterable: false, and enables FTS with default settings
},
'permissions': {
'type': '[]uuid', # otherwise inferred as slower/more expensive []string
}
}
)Column-based writes
Bulk document operations should use a column-oriented layout for best
performance. You can pass any combination of upsert_columns, patch_columns,
deletes, and delete_by_filter to the write request.
If the same document ID appears multiple times in the request, the request will fail with an HTTP 400 error.
import turbopuffer
tpuf = turbopuffer.Turbopuffer(
region='gcp-us-central1', # choose best region: https://turbopuffer.com/docs/regions
)
ns = tpuf.namespace(f'write-upsert-columns-example-py')
# If an error occurs, this call raises a turbopuffer.APIError if a retry was not successful.
ns.write(
upsert_columns={
'id': [1, 2, 3, 4],
'vector': [[0.1, 0.1], [0.2, 0.2], [0.3, 0.3], [0.4, 0.4]],
'my-string': ['one', None, 'three', 'four'],
'my-uint': [12, None, 84, 39],
'my-bool': [True, None, False, True],
'my-string-array': [['a', 'b'], ['b', 'd'], [], ['c']]
},
patch_columns={
'id': [5, 6],
'my-bool': [True, False],
},
deletes=[7, 8],
distance_metric='cosine_distance'
)Conditional writes
To make writes conditional, use the upsert_condition, patch_condition, and
delete_condition parameters. These let you specify a condition that must be
satisfied for the write operation to each document to proceed.
Conditional deletes are distinct from delete_by_filter,
which should be used when the set of IDs to conditionally delete is not known
ahead of time.
Conditions are evaluated before each write, using the current value of the document with the matching ID.
- If the document exists and the condition is met, the write is applied.
- If the document exists and the condition is not met, the write is skipped.
- If the document does not exist, the write is applied unconditionally for upserts and skipped unconditionally for patches and deletes.
The operation will return the actual number of documents written (upserted, patched, or deleted).
Internally, the operation performs a query (one per batch) to determine which documents match the condition, so it is billed as both a query and a write operation. However, if the condition is not met for a given document, that write is skipped and not billed.
The condition syntax matches the filters parameter in the query
API, with an additional feature: you can reference the new value
being written using $ref_new references. These look like {"$ref_new": "attr_123"}
and can be used in place of value literals. This allows the condition to vary by
document in a multi-document write request.
Two common patterns:
- Version checks: Set
upsert_conditionto["version", "Lt", {"$ref_new": "version"}]to only apply writes when the new document has a higher version than the existing one. - Insert if not exists: Set
upsert_conditionto["id", "Eq", null]to only insert documents that don't already exist. Since existing documents always have a non-nullid, this condition fails for existing documents (skipping the write), while new documents are inserted unconditionally.
import turbopuffer
tpuf = turbopuffer.Turbopuffer(
region='gcp-us-central1', # choose best region: https://turbopuffer.com/docs/regions
)
ns = tpuf.namespace(f'write-conditional-example-py')
ns.write(
upsert_rows=[
{
'id': 101,
'vector': [0.2, 0.8],
'title': 'LISP Guide for Beginners (draft_v2)',
'version': 2
},
{
'id': 102,
'vector': [0.4, 0.4],
'title': 'AI for Practitioners (final)',
'version': 5
}
],
distance_metric='cosine_distance'
)
# Conditionally upsert documents with news title, making sure no version
# regression occurs.
result = ns.write(
upsert_rows=[
{
'id': 101,
'vector': [0.2, 0.8],
'title': 'LISP Guide for Beginners (final)',
'version': 3
},
{
'id': 102,
'vector': [0.4, 0.4],
'title': 'AI for Practitioners (draft_v4)',
'version': 4
},
{
'id': 103,
'vector': [0.6, 0.8],
'title': 'Database Internals (draft_v1)',
'version': 1
}
],
upsert_condition=('version', 'Lt', {'$ref_new': 'version'}),
distance_metric='cosine_distance'
)
print(result.rows_affected) # 2
results = ns.query(limit=10, include_attributes=True)
print(results.rows)Delete by filter
To delete documents that match a filter, use delete_by_filter. This operation will return
the actual number of documents removed.
Because the operation internally issues a query to determine which documents to delete, this operation is billed as both a query and a write operation.
If delete_by_filter is used in the same request as other write operations,
delete_by_filter will be applied before the other operations. This allows you
to delete rows that match a filter before writing new row with overlapping IDs.
Note that patches to any deleted rows are ignored.
delete_by_filter has the same syntax as the filters parameter in the query API.
import turbopuffer
tpuf = turbopuffer.Turbopuffer(
region="gcp-us-central1", # choose best region: https://turbopuffer.com/docs/regions
)
ns = tpuf.namespace(f'write-delete-by-filter-example-py')
ns.write(
upsert_rows=[
{
"id": 101,
"vector": [0.2, 0.8],
"title": "LISP Guide for Beginners",
"views": 10,
},
{
"id": 102,
"vector": [0.4, 0.4],
"title": "AI for Practitioners",
"views": 2500,
},
],
distance_metric="cosine_distance",
)
# Delete posts with titles that include the word "guide"
# and have 1000 or less views
result = ns.write(
delete_by_filter=("And", [("title", "IGlob", "*guide*"), ("views", "Lte", 1000)])
)
print(result.rows_affected) # 1
results = ns.query(rank_by=("id", "asc"), limit=10)
print(len(results.rows)) # 1Patch by filter
To patch a dynamically determined set of documents, use patch_by_filter. This operation will return the actual number of documents updated. When rows_remaining is set to true in the response, there may be more documents matching your filter that can be patched, issue a follow up request to patch those documents.
Because this operation uses a query to determine which rows need to be patched, it will be billed as a query & a patch operation (1 write, 2 queries total).
If patch_by_filter is used in the same request as other write operations, it is applied after delete_by_filter but before any other write operations. patch_by_filter will not apply to any rows which were deleted.
import turbopuffer
tpuf = turbopuffer.Turbopuffer(
region="gcp-us-central1", # choose best region: https://turbopuffer.com/docs/regions
)
ns = tpuf.namespace(f'write-patch-by-filter-example-py')
ns.write(
upsert_rows=[
{
"id": 101,
"vector": [0.2, 0.8],
"title": "LISP Guide for Beginners",
"views": 10,
"status": "published",
},
{
"id": 102,
"vector": [0.4, 0.4],
"title": "AI for Practitioners",
"views": 2500,
"status": "published",
},
{
"id": 103,
"vector": [0.6, 0.3],
"title": "Rust Basics",
"views": 50,
"status": "published",
},
],
distance_metric="cosine_distance",
)
# Archive posts that are published and have 100 or fewer views
result = ns.write(
patch_by_filter={
"filters": ("And", [("status", "Eq", "published"), ("views", "Lte", 100)]),
"patch": {"status": "archived"},
}
)
print(result.rows_affected) # 2
results = ns.query(rank_by=("id", "asc"), include_attributes=["status"], limit=10)
for row in results.rows:
print(f"ID {row['id']}: {row['status']}") # IDs 101 and 103 are now archived