Now open for all, let's get you puffin'

POST /v2/namespaces/:namespace/query

Query, filter, full-text search and vector search documents.

Latency

1M docs

Percentile

Latency

p50
16ms
p90
21ms
p99
33ms

A query retrieves documents in a single namespace, returning the ordered or highest-ranked documents that match the query's filters.

turbopuffer supports the following types of queries:

  • Vector search: find the documents closest to a query vector
  • Full-text search: find documents with the highest BM25 score, a classic text search algorithm that considers query term frequency and document length
  • Ordering by attributes: find all documents matching filters in order of an attribute
  • Lookups: find all documents matching filters when order isn't important
  • Aggregations: aggregate attribute values across all documents matching filters

Request

rank_by arrayrequired unless aggregate_by is set

How to rank the documents in the namespace. Supported ranking functions:

For hybrid search, you must do multiple queries (e.g. BM25 + vector) and combine the results client-side with e.g. reciprocal-rank fusion. We encourage users to write a strong query layer abstraction, as it's not uncommon to do several turbopuffer queries per user query. Soon turbopuffer will support multiple queries in the same request.

Vector example: ["vector", "ANN", [0.1, 0.2, 0.3, ..., 76.8]]

BM25: ["text", "BM25", "fox jumping"]

Order by attribute example: ["timestamp", "desc"]

BM25 with multiple, weighted fields:

["Sum", [
    ["Product", [2, ["title", "BM25", "fox jumping"]]],
    ["content", "BM25", "fox jumping"]
  ]
]

top_k numberrequired

Number of documents to return.

Maximum: 1200 (adjustable upon request)


filters arrayoptional

Exact filters for attributes to refine search results for. Think of it as a SQL WHERE clause.

See Filtering Parameters below for details.

When combined with a vector, the query planner will automatically combine the attribute index and the approximate nearest neighbor index for best performance and recall. See our post on Native Filtering for details.

For the best performance, separate documents into namespaces instead of filtering where possible. See also Performance.

Example: ["And", [["id", "Gte", 1000], ["permissions", "In", ["3d7a7296-3d6a-4796-8fb0-f90406b1f621", "92ef7c95-a212-43a4-ae4e-0ebc96a65764"]]]]


include_attributes array[string] | booleandefault: id

List of attribute names to return in the response. Can be set to true to return all attributes. Return only the ones you need for best performance.


aggregate_by objectrequired unless rank_by is set

Aggregations to compute over all documents in the namespace that match the filters.

Cannot be specified with rank_by, top_k, or include_attributes. We plan to lift these restrictions soon.

Each entry in the object maps a label for the aggregation to an aggregate function. Supported aggregate functions:

  • ["Count", "attr"]: counts the number of documents with a non-null value for the attr attribute. Limitation: currently only the id attribute is supported.

Example: {"my_count_of_ids": ["Count", "id"]}


vector_encoding stringdefault: float

The encoding to use for the vectors in the response. The supported encodings are float and base64.

If float, vectors are returned as arrays of numbers.

If base64, vectors are returned as base64-encoded strings representing the vectors serialized in little-endian float32 binary format.

This parameter has no effect if the vector attribute is not included in the response (see the include_attributes parameter).


consistency objectdefault: {'level': 'strong'}

Choose between strong and eventual read-after-write consistency.

  • Strong consistency (default): {"level": "strong"}
  • Eventual consistency: {"level": "eventual"}

Strong consistency requires a round-trip to object storage to fetch the latest writes before returning a query result, ensuring up-to-date data but adding latency. Eventual consistency removes this requirement, potentially reducing latency while causing stale reads in some cases. Benchmarking on a vector workload (768 dims, 1M docs, ~3GB) shows a p50 warm latency of 16 ms for strong consistency and 10 ms for eventual consistency.

Most queries are served by the same node that handles writes, so updates are usually visible immediately. Over 99.99% of queries return consistent data. Here's a more specific breakdown based on our monitoring data:

% of queriesmaximum lag (<= time)
99.9970%0s (strongly consistent)
99.9973%1s
99.9975%10s
99.9976%60s
100%1h

In rare cases (eg. namespace routing changes during scaling) reads may briefly return stale data until its cache updates. This query staleness is typically limited to ~100ms (as the commit log entry is updated in the background), with a strict upper bound of 1 hour (currently non-configurable but subject to future tuning). However, the cache is refreshed on every query, so the latest writes should appear on the next request.

Response

rows array

An array of the top_k documents that matched the query, ordered by the ranking function. Only present if rank_by is specified.

Each document is an object containing the requested attributes. The id attribute is always included. The special attribute $dist is set to the ranking function's score for the document (distance from the query vector for ANN; BM25 score for BM25; omitted when ordering by an attribute).

Example:

[
  {"$dist": 1.7, "id": 8, "extra_attr": "puffer"},
  {"$dist": 3.1, "id": 20, "extra_attr": "fish"}
]
aggregations object

An object mapping the label for each requested aggregation to the computed value. Only present if aggregate_by is specified.

Example:

{ "my_count_of_ids": 42 }
billing object

The billable resources consumed by the query. The object contains the following fields:

  • billable_logical_bytes_queried (uint): the number of logical bytes processed by the query
  • billable_logical_bytes_returned (uint): the number of logical bytes returned by the query
performance object

The performance metrics for the query. The object currently contains the following fields, but these fields may change name, type, or meaning in the future:

  • cache_hit_ratio (float): the ratio of cache hits to total cache lookups
  • cache_temperature (string): a qualitative description of the cache hit ratio (hot, warm, or cold)
  • server_total_ms (uint): request time measured on the server, including time spent waiting for other queries to complete if the namespace was at its concurrency limit
  • query_execution_ms (uint): request time measured on the server, excluding time spent waiting due to the namespace concurrency limit
  • exhaustive_search_count (uint): the number of unindexed documents processed by the query
  • approx_namespace_count (uint): the approximate number of documents in the namespace

Contact the turbopuffer team if you need help interpreting these metrics.

Examples

The query vector must have the same dimensionality as the vectors in the namespace being queried.

Filters

When you need to filter documents, you can combine filters with vector search or use them alone. Here's an example of finding recent public documents:

Ordering by Attributes

You can specify a rank_by parameter to order results by a specific attribute (i.e. SQL ORDER BY). For example, to order by timestamp in descending order:

Ordering by multiple attributes isn't yet implemented.

Similar to SQL, the ordering of results is not guaranteed when multiple documents have the same attribute value for the rank_by parameter. Array attributes aren't supported.

Lookups

To find all documents matching filters when order isn't important to you, rank by the id attribute, which is guaranteed to be present in every namespace:

"filters": [...],
"rank_by": ["id", "asc"],
"top_k": ...

If you expect more than top_k results, see Pagination.

Aggregations

You can aggregate attribute values across all documents in the namespace that match the query's filters using the aggregate_by parameter.

For example, to count the number of documents in a namespace:

You cannot currently combine aggregations with rank_by. We plan to lift this restriction soon.

The FTS attribute must be configured with full_text_search set in the schema when writing documents. See Schema documentation and the Full-Text Search guide for more details.

For an example of hybrid search (combining both vector and BM25 results), see Hybrid Search.

You can combine BM25 full-text search with filters to limit results to a specific subset of documents.

FTS operators

FTS operators combine the results of multiple sub-queries into a single score. Specifically, the following operators are supported:

  • Sum: Sum the scores of the sub-queries.
  • Max: Use the maximum score of sub-queries as the score.

Operators can be nested. For example:

"rank_by": ["Sum", [
  ["Max", [
    ["title", "BM25", "whale facts"],
    ["description", "BM25", "whale facts"]
  ]],
  ["content", "BM25", "huge whale"]
]]

Field weights/boosts

You can specify a weight / boost per-field by using the Product operator inside a rank_by. For example, to apply a 2x score multiplier on the title sub-query:

"rank_by": ["Sum", [
  ["Product", [2, ["title", "BM25", "quick fox"]]],
  ["content", "BM25", "quick fox"]
]]

Phrase matching

A simple form of phrase matching is supported with the ContainsAllTokens filter. This filter matches documents that contain all the tokens present in the filter input string:

"filters": ["text", "ContainsAllTokens", "lazy walrus"]

Specifically, this filter would match a document containing "walrus is super lazy", but not a document containing only "lazy." Combining this with a Not filter can help exclude unwanted results:

"filters": ["Not", ["text", "ContainsAllTokens", "polar bear"]]

Full phrase matching, i.e. requiring the exact phrase "lazy walrus", with the terms adjacent and in that order, is not yet supported.

Filtering

Filters allow you to narrow down results by applying exact conditions to attributes. Conditions are arrays with an attribute name, operation, and value, for example:

  • ["attr_name", "Eq", 42]
  • ["page_id", "In", ["page1", "page2"]]
  • ["user_migrated_at", "NotEq", null]

Values must have the same type as the attribute's value, or an array of that type for operators like In.

Conditions can be combined using {And,Or} operations:

// basic And condition
"filters": ["And", [
  ["attr_name", "Eq", 42],
  ["page_id", "In", ["page1", "page2"]]
]]

// conditions can be nested
"filters": ["And", [
  ["page_id", "In", ["page1", "page2"]],
  ["Or", [
    ["public", "Eq", 1],
    ["permission_id", "In", ["3iQK2VC4", "wzw8zpnQ"]]
  ]]
]]

Filters can also be applied to the id field, which refers to the document ID.

Filtering Parameters

And array[filter]

Matches if all of the filters match.

Or array[filter]

Matches if at least one of the filters matches.

Not filter

Matches if the filter does not match.


Eq id or value

Exact match for id or attributes values. If value is null, matches documents missing the attribute.

NotEq value

Inverse of Eq, for attributes values. If value is null, matches documents with the attribute.


In array[id] or array[value]

Matches any id or attributes values contained in the provided list. If both the provided value and the target document field are arrays, then this checks if any elements of the two sets intersect.

NotIn array[value]

Inverse of In, matches any attributes values not contained in the provided list.


Lt value

For ints, this is a numeric less-than on attributes values. For strings, lexicographic less-than. For datetimes, numeric less-than on millisecond representation.

Lte value

For ints, this is a numeric less-than-or-equal on attributes values. For strings, lexicographic less-than-or-equal. For datetimes, numeric less-than-or-equal on millisecond representation.

Gt value

For ints, this is a numeric greater-than on attributes values. For strings, lexicographic greater-than. For datetimes, numeric greater-than on millisecond representation.

Gte value

For ints, this is a numeric greater-than-or-equal on attributes values. For strings, lexicographic greater-than-or-equal. For datetimes, numeric greater-than-or-equal on millisecond representation.


Glob globset

Unix-style glob match against string attributes values. The full syntax is described in the globset documentation. Glob patterns with a concrete prefix like "foo*" internally compile to efficient range queries

NotGlob globset

Inverse of Glob, Unix-style glob filters against string attributes values. The full syntax is described in the globset documentation.

IGlob globset

Case insensitive version of Glob.

NotIGlob globset

Case insensitive version of NotGlob.


ContainsAllTokens string

Matches if all tokens in the input string are present in the attributes value. Requires that the attribute is configured for full-text search.

Complex Example

Using nested And and Or filters:

Pagination

When Ordering by Attributes, you can page through results by advancing a filter on the order attribute. For example, to paginate by ID, advance a greater-than filter on ID:

Currently paginating beyond the first page for full-text search and vector search is not supported. Pass a larger top_k value to get more results and paginate client-side. If you need a higher limit, please contact us.

Follow
Blog