Logo

Query documents

POST /v1/namespaces/:namespace/query

Query, filter, full-text search and vector search documents.

Latency

1M docs

Percentile

Latency

p50
17ms
p90
24ms
p99
39ms

Parameters

filters object | arrayoptional

Exact filters for attributes to refine search results for. Think of it as a SQL WHERE clause.

See Filtering Parameters below for details.

When combined with a vector, the query planner will automatically combine the attribute index and the approximate nearest neighbor index for best performance and recall.

For the best performance, separate documents into namespaces instead of filtering where possible. See also Performance.

Example: ["And", [["id", "Gte", 1000], ["permissions", "In", ["3d7a7296-3d6a-4796-8fb0-f90406b1f621", "92ef7c95-a212-43a4-ae4e-0ebc96a65764"]]]]


vector array[float]optional

Optionally the vector to search for.

It must have the same number of dimensions as the vectors in the namespace.

Example: [0.1, 0.2, 0.3, ..., 76.8]


rank_by arrayoptional

Used for BM25 full-text search or ordering by attributes.

Currently, you can pass a rank_by parameter or a vector parameter, but not both. If neither is passed, results are sorted by ID.

For hybrid search, you must do multiple queries (e.g. BM25 + vector) and combine the results client-side with e.g. reciprocal-rank fusion. We encourage users to write a strong query layer abstraction, as it's not uncommon to do 6 turbopuffer queries per user query.

Order by attribute example: ["timestamp", "desc"]

BM25: ["text", "BM25", "fox jumping"]

BM25 with multiple fields: ["Sum", [["text", "BM25", "fox jumping"], ["name", "BM25", "fox jumping"]]]


distance_metric cosine_distance | euclidean_squaredrequired if vector is set

The function used to calculate vector similarity. Possible values are cosine_distance or euclidean_squared.

cosine_distance is defined as 1 - cosine_similarity and ranges from 0 to 2. Lower is better.

euclidean_squared is defined as sum((x - y)^2). Lower is better.


top_k numberdefault: 10

Number of results to return.


include_vectors booleandefault: false

Return vectors for the search results. Vectors are large and slow to deserialize on the client, so use this option only if you need them.


include_attributes array[string]default: id

List of attribute names to return in the response. Can be set to true to return all attributes. Return only the ones you need for best performance.

Examples

{ // Request payload
  "vector": [0.1, 0.1],
  "distance_metric": "distance_metric",
  "top_k": 10
}

[ // Response payload
  { "dist": 0.0, "id": 1 },
  { "dist": 2.0, "id": 2 }
]

Filters

When you need to filter documents, you can combine filters with vector search or use them alone. Here's an example of finding recent public documents:

{ // Request payload
  "filters": ["And", [
    ["timestamp", "Gte", 1709251200], // Documents after March 1, 2024
    ["public", "Eq", true]
  ]],
  "vector": [0.1, 0.2, 0.3], // Optional: include vector to combine with filters
  "distance_metric": "cosine_distance", // Required if vector is set
  "top_k": 10,
  "include_attributes": ["title", "timestamp"]
}

[ // Response payload
  {
    "id": 1,
    "dist": 0.15, // Only present when vector search is used
    "attributes": {
      "title": "Getting Started Guide",
      "timestamp": 1709337600  // March 2, 2024
    }
  },
  {
    "id": 2,
    "dist": 0.28,
    "attributes": {
      "title": "Advanced Features",
      "timestamp": 1709424000  // March 3, 2024
    }
  }
]

Ordering by Attributes

Filter-only (no vector or FTS/BM25) queries can specify a rank_by parameter to order results by a specific attribute (i.e. SQL ORDER BY). For example, to order by timestamp in descending order:

{ // Request payload
  "filters": ["timestamp", "Lt", 1709251200], // Documents before March 1, 2024
  "rank_by": ["timestamp", "desc"], // Order by timestamp in descending order
  "top_k": 1000,
  "include_attributes": ["title", "timestamp"]
}

[ // Response payload
  {
    "id": 6,
    "attributes": {
      "title": "Roadmap",
      "timestamp": 1709020800  // Feb 27th, 2024
    }
  },
  {
    "id": 4,
    "attributes": {
      "title": "Performance Guide",
      "timestamp": 1708761600  // Feb 24th, 2024
    }
  }
]

Ordering by multiple attributes isn't yet implemented.

Similar to SQL, the ordering of results is not guaranteed when multiple documents have the same attribute value for the rank_by parameter. Array attributes aren't supported.

The FTS attribute must be configured with full_text_search set in the schema when upserting documents. See Schema documentation and the Full-Text Search guide for more details.

You can combine BM25 full-text search with filters to search within specific document subsets. For hybrid search combining both vector and BM25 results, see Hybrid Search.

{ // Request payload
  "rank_by": ["content", "BM25", "quick fox"],
  "filters": ["And", [
    ["timestamp", "Gte", 1709251200], // Documents after March 1, 2024
    ["public", "Eq", true]
  ]],
  "top_k": 10,
  "include_attributes": ["title", "content", "timestamp"]
}

[ // Response payload
  {
    "id": 1,
    "dist": 0.85, // BM25 relevance score (lower is better)
    "attributes": {
      "title": "Animal Stories",
      "content": "The quick brown fox jumps over the lazy dog",
      "timestamp": 1709337600  // March 2, 2024
    }
  },
  {
    "id": 2,
    "dist": 1.28,
    "attributes": {
      "title": "Forest Tales",
      "content": "A quick red fox darted through the forest",
      "timestamp": 1709424000  // March 3, 2024
    }
  }
]

Filtering

Filters allow you to narrow down results by applying exact conditions to attributes. Conditions are arrays with an attribute name, operation, and value, for example:

  • ["attr_name", "Eq", 42]
  • ["page_id", "In", ["page1", "page2"]]
  • ["user_migrated_at", "NotEq", null]

Values must have the same type as the attribute's value, or an array of that type for operators like In.

Conditions can be combined using {And,Or} operations:

// basic And condition
"filters": ["And", [
  ["attr_name", "Eq", 42],
  ["page_id", "In", ["page1", "page2"]]
]]

// conditions can be nested
"filters": ["And", [
  ["page_id", "In", ["page1", "page2"]],
  ["Or", [
    ["public", "Eq", 1],
    ["permission_id", "In", ["3iQK2VC4", "wzw8zpnQ"]]
  ]]
]]

// legacy API: an object may provided instead (implicitly And)
"filters": {
  "attr_name": ["Eq", 42],
  "page_id": ["In", ["page1", "page2"]],
}

Filters can also be applied to the id field, which refers to the document ID. `

Filtering Parameters

Eq id or value

Exact match for id or attributes values. If value is null, matches documents missing the attribute.


NotEq value

Inverse of Eq, for attributes values. If value is null, matches documents with the attribute.


In array[id] or array[value]

Matches any id or attributes values contained in the provided list. If both the provided value and the target document field are arrays, then this checks if any elements of the two sets intersect.


NotIn array[value]

Inverse of In, matches any attributes values not contained in the provided list.


Lt value

For ints, this is a numeric less-than on attributes values. For strings, lexicographic less-than.

Lte value

For ints, this is a numeric less-than-or-equal on attributes values. For strings, lexicographic less-than-or-equal.

Gt value

For ints, this is a numeric greater-than on attributes values. For strings, lexicographic greater-than.

Gte value

For ints, this is a numeric greater-than-or-equal on attributes values. For strings, lexicographic greater-than-or-equal.


Glob globset

Unix-style glob match against attributes values. The full syntax is described in the globset documentation. Glob patterns with a concrete prefix like "foo*" internally compile to efficient range queries

NotGlob globset

Inverse of Glob, Unix-style glob filters against attributes values. The full syntax is described in the globset documentation.

IGlob globset

Case insensitive version of Glob.

NotIGlob globset

Case insensitive version of NotGlob.


And array[filter]

Matches if all of the filters match.

Or array[filter]

Matches if at least one of the filters matches.

Complex Example

Using nested And and Or filters:

{
  "vector": [0.1, 0.1],
  "distance_metric": "euclidean_squared",
  "top_k": 10,
  "include_attributes": ["key1"],
  "filters": [
    "And",
    [
      ["id", "In", [1, 2, 3]],
      ["key1", "Eq", "one"],
      ["filename", "NotGlob", "/vendor/**"],
      [
        "Or",
        [
          ["filename", "Glob", "**.tsx"],
          ["filename", "Glob", "**.js"]
        ]
      ]
    ]
  ]
}

Pagination

If you need to paginate the entire namespace, use the more performant Export endpoint.

By default, results are sorted in ascending ID order, which allows pagination with an ID greater-than filter for filter-only queries. You can specify a rank_by parameter to order results by attributes, see Ordering by Attributes.

ns = tpuf.Namespace("pagination")

last_id = None
while True:
  results = ns.query(
      top_k=1000,
      filters=["And", [
        ["timestamp", "Gte", 1],
        ["id", "Gte" if last_id is None else "Gt", last_id or 0]
      ]]
  )
  print(results)
  if not results or len(results) < 1000:
      break
  last_id = results[-1].id

Currently paginating beyond the first page for full-text search and vector search is not supported. Pass a larger top_k value to get more results and paginate client-side. If you need a higher limit, please contact us.

© 2024 turbopuffer Inc.
Privacy PolicyTerms of service