Query, filter, full-text search and vector search documents.
Latency
Percentile
Latency
A query retrieves documents in a single namespace, returning the ordered or highest-ranked documents that match the query's filters.
turbopuffer supports the following types of queries:
How to rank the documents in the namespace. Supported ranking functions:
For hybrid search, you must do multiple queries (e.g. BM25 + vector) and combine the results client-side with e.g. reciprocal-rank fusion. We encourage users to write a strong query layer abstraction, as it's not uncommon to do several turbopuffer queries per user query. Soon turbopuffer will support multiple queries in the same request.
Vector example: ["vector", "ANN", [0.1, 0.2, 0.3, ..., 76.8]]
BM25: ["text", "BM25", "fox jumping"]
Order by attribute example: ["timestamp", "desc"]
BM25 with multiple, weighted fields:
["Sum", [
["Product", [2, ["title", "BM25", "fox jumping"]]],
["content", "BM25", "fox jumping"]
]
]
Number of documents to return.
Maximum: 1200 (adjustable upon request)
Exact filters for attributes to refine search results for. Think of it as a SQL WHERE clause.
See Filtering Parameters below for details.
When combined with a vector, the query planner will automatically combine the attribute index and the approximate nearest neighbor index for best performance and recall. See our post on Native Filtering for details.
For the best performance, separate documents into namespaces instead of filtering where possible. See also Performance.
Example: ["And", [["id", "Gte", 1000], ["permissions", "In", ["3d7a7296-3d6a-4796-8fb0-f90406b1f621", "92ef7c95-a212-43a4-ae4e-0ebc96a65764"]]]]
List of attribute names to return in the response. Can be set to true
to
return all attributes. Return only the ones you need for best performance.
Aggregations to compute over all documents in the namespace that match the filters.
Cannot be specified with rank_by, top_k, or include_attributes. We plan to lift these restrictions soon.
Each entry in the object maps a label for the aggregation to an aggregate function. Supported aggregate functions:
["Count", "attr"]
: counts the number of documents with a non-null value
for the attr
attribute. Limitation: currently only the id
attribute
is supported.Example: {"my_count_of_ids": ["Count", "id"]}
The encoding to use for the vectors in the response. The supported encodings
are float
and base64
.
If float
, vectors are returned as arrays of numbers.
If base64
, vectors are returned as base64-encoded strings representing the
vectors serialized in little-endian float32 binary format.
This parameter has no effect if the vector
attribute is not included in the
response (see the include_attributes parameter).
Choose between strong and eventual read-after-write consistency.
{"level": "strong"}
{"level": "eventual"}
Strong consistency requires a round-trip to object storage to fetch the latest writes before returning a query result, ensuring up-to-date data but adding latency. Eventual consistency removes this requirement, potentially reducing latency while causing stale reads in some cases. Benchmarking on a vector workload (768 dims, 1M docs, ~3GB) shows a p50 warm latency of 16 ms for strong consistency and 10 ms for eventual consistency.
Most queries are served by the same node that handles writes, so updates are usually visible immediately. Over 99.99% of queries return consistent data. Here's a more specific breakdown based on our monitoring data:
% of queries | maximum lag (<= time) |
---|---|
99.9970% | 0s (strongly consistent) |
99.9973% | 1s |
99.9975% | 10s |
99.9976% | 60s |
100% | 1h |
In rare cases (eg. namespace routing changes during scaling) reads may briefly return stale data until its cache updates. This query staleness is typically limited to ~100ms (as the commit log entry is updated in the background), with a strict upper bound of 1 hour (currently non-configurable but subject to future tuning). However, the cache is refreshed on every query, so the latest writes should appear on the next request.
An array of the top_k documents that matched the query, ordered by the ranking function. Only present if rank_by is specified.
Each document is an object containing the requested attributes. The id
attribute is always included. The special attribute $dist
is set to the ranking function's score for the document (distance from the query vector for ANN
; BM25 score for BM25
; omitted when ordering by an attribute).
Example:
[
{"$dist": 1.7, "id": 8, "extra_attr": "puffer"},
{"$dist": 3.1, "id": 20, "extra_attr": "fish"}
]
An object mapping the label for each requested aggregation to the computed value. Only present if aggregate_by is specified.
Example:
{ "my_count_of_ids": 42 }
The billable resources consumed by the query. The object contains the following fields:
billable_logical_bytes_queried
(uint): the number of logical bytes processed by the querybillable_logical_bytes_returned
(uint): the number of logical bytes returned by the queryThe performance metrics for the query. The object currently contains the following fields, but these fields may change name, type, or meaning in the future:
cache_hit_ratio
(float): the ratio of cache hits to total cache lookupscache_temperature
(string): a qualitative description of the cache hit ratio (hot
, warm
, or cold
)server_total_ms
(uint): request time measured on the server, including time spent waiting for other queries to complete if the namespace was at its concurrency limitquery_execution_ms
(uint): request time measured on the server, excluding time spent waiting due to the namespace concurrency limitexhaustive_search_count
(uint): the number of unindexed documents processed by the queryapprox_namespace_count
(uint): the approximate number of documents in the namespaceContact the turbopuffer team if you need help interpreting these metrics.
The query vector must have the same dimensionality as the vectors in the namespace being queried.
When you need to filter documents, you can combine filters with vector search or use them alone. Here's an example of finding recent public documents:
You can specify a rank_by
parameter to order results by a specific attribute (i.e. SQL ORDER BY
). For example, to order by timestamp in descending order:
Ordering by multiple attributes isn't yet implemented.
Similar to SQL, the ordering of results is not guaranteed when multiple documents have the same attribute value for the rank_by
parameter. Array attributes aren't supported.
To find all documents matching filters when order isn't important to you, rank
by the id
attribute, which is guaranteed to be present in every namespace:
"filters": [...],
"rank_by": ["id", "asc"],
"top_k": ...
If you expect more than top_k
results, see Pagination.
You can aggregate attribute values across all documents in the namespace that match the query's filters using the aggregate_by parameter.
For example, to count the number of documents in a namespace:
You cannot currently combine aggregations with rank_by. We plan to lift this restriction soon.
The FTS attribute must be configured with full_text_search
set in the schema
when writing documents. See Schema documentation and
the Full-Text Search guide for more details.
For an example of hybrid search (combining both vector and BM25 results), see Hybrid Search.
You can combine BM25 full-text search with filters to limit results to a specific subset of documents.
FTS operators combine the results of multiple sub-queries into a single score. Specifically, the following operators are supported:
Sum
: Sum the scores of the sub-queries.Max
: Use the maximum score of sub-queries as the score.Operators can be nested. For example:
"rank_by": ["Sum", [
["Max", [
["title", "BM25", "whale facts"],
["description", "BM25", "whale facts"]
]],
["content", "BM25", "huge whale"]
]]
You can specify a weight / boost per-field by using the Product
operator inside a rank_by
.
For example, to apply a 2x score multiplier on the title
sub-query:
"rank_by": ["Sum", [
["Product", [2, ["title", "BM25", "quick fox"]]],
["content", "BM25", "quick fox"]
]]
A simple form of phrase matching is supported with the ContainsAllTokens
filter. This filter matches documents that contain all the tokens present in the filter input string:
"filters": ["text", "ContainsAllTokens", "lazy walrus"]
Specifically, this filter would match a document containing "walrus is super lazy", but not a document containing only "lazy." Combining this with a Not
filter can help exclude unwanted results:
"filters": ["Not", ["text", "ContainsAllTokens", "polar bear"]]
Full phrase matching, i.e. requiring the exact phrase "lazy walrus", with the terms adjacent and in that order, is not yet supported.
Filters allow you to narrow down results by applying exact conditions to attributes. Conditions are arrays with an attribute name, operation, and value, for example:
["attr_name", "Eq", 42]
["page_id", "In", ["page1", "page2"]]
["user_migrated_at", "NotEq", null]
Values must have the same type as the attribute's value, or an array of that type for operators like In
.
Conditions can be combined using {And,Or}
operations:
// basic And condition
"filters": ["And", [
["attr_name", "Eq", 42],
["page_id", "In", ["page1", "page2"]]
]]
// conditions can be nested
"filters": ["And", [
["page_id", "In", ["page1", "page2"]],
["Or", [
["public", "Eq", 1],
["permission_id", "In", ["3iQK2VC4", "wzw8zpnQ"]]
]]
]]
Filters can also be applied to the id
field, which refers to the document ID.
Matches if all of the filters match.
Matches if at least one of the filters matches.
Matches if the filter does not match.
Exact match for id
or attributes
values. If value is null
, matches documents missing the attribute.
Inverse of Eq
, for attributes
values. If value is null
, matches documents with the attribute.
Matches any id
or attributes
values contained in the provided list. If both the provided value and the target document field are arrays, then this checks if any elements of the two sets intersect.
Inverse of In
, matches any attributes
values not contained in the provided list.
For ints, this is a numeric less-than on attributes
values. For strings, lexicographic less-than. For datetimes, numeric less-than on millisecond representation.
For ints, this is a numeric less-than-or-equal on attributes
values. For strings, lexicographic less-than-or-equal. For datetimes, numeric less-than-or-equal on millisecond representation.
For ints, this is a numeric greater-than on attributes
values. For strings, lexicographic greater-than. For datetimes, numeric greater-than on millisecond representation.
For ints, this is a numeric greater-than-or-equal on attributes
values. For strings, lexicographic greater-than-or-equal. For datetimes, numeric greater-than-or-equal on millisecond representation.
Unix-style glob match against string attributes
values. The full syntax is described in the globset documentation. Glob patterns with a concrete prefix like "foo*" internally compile to efficient range queries
Inverse of Glob
, Unix-style glob filters against string attributes
values. The full syntax is described in the globset documentation.
Case insensitive version of Glob
.
Case insensitive version of NotGlob
.
Matches if all tokens in the input string are present in the attributes
value. Requires that the attribute is configured for full-text search.
Using nested And
and Or
filters:
When Ordering by Attributes, you can page through results by advancing a filter on the order attribute. For example, to paginate by ID, advance a greater-than filter on ID:
Currently paginating beyond the first page for full-text search and vector
search is not supported. Pass a larger top_k
value to get more results and
paginate client-side. If you need a higher limit, please contact us.