Upsert vectors

POST /v1/vectors/:namespace

Creates, updates, or deletes vectors.

Percentile

Latency

P50
285ms
P90
370ms
P99
688ms
MAX
1250ms

Writes are consistent and thus immediately visible to queries.

The :namespace parameter identifies a set of vectors. Within a namespace, vectors are uniquely referred to by their ID. Upserting a vector will overwrite any existing vector with the same ID.

Namespaces are created when the first vector is inserted.

For performance, we recommend creating a namespace per isolated vector space instead of filtering when possible.

The maximum number of upserts per call is 10,000 or a maximum payload size of 256 MB. For performance, we recommend writing in large batches for maximum throughput, to account for the latency of writing to object storage.

If this call returns OK, data is guaranteed to be durably written to object storage. You can read more about how upserts work on the Architecture page.

If low latency on upserts is a critical blocker for you, it can be improved dramatically. Just contact us.

Warning: Queries may be slow during periods of high write throughput or after a large bulk import.

turbopuffer can handle >= 10,000 writes/s (WPS) per namespace, but indexing cannot currently keep up. This causes high query latency while performing bulk imports. When write throughput decreases (<= 100 per second) the indexer catches up, and queries will be fast.

Most use-cases do an initial bulk import, followed by queries with lower write throughput (<= 100 per second). For this use-case, it's not a problem. We are actively working to improve this limitation.

Parameters

ids arrayrequired unless upserts is set

Vector IDs are stored as unsigned 64-bit integers or string ids, depending on what's passed in the first request. Mixing ID types is not supported.


vectors arrayrequired unless upserts is set

Must be the same length as the ids field. Each element is an array of numbers representing a vector. To delete one or more vectors, pass null in the vectors field.

Vector elements are stored as 32-bit floats. We intend to support several other formats.

Each vector in the namespace must have the same number of dimensions.


attributes object

Vectors can optionally include attributes, which are used to filter search results. Attributes are key/value mappings. Keys are strings, and values can be strings, unsigned integers, or arrays of either. More value types will be added in the future.

This parameter is an object where the keys are the attribute names, and the values are arrays of attribute values. Each array must be the same length as the ids field. When a vector doesn't have a value for a given attribute, pass null.

Attribute names id and vector are reserved, and an error will be returned if they are set.


upserts object

Instead of specifying the upserts in a column-based format, you can use this optional param to specify them in a row-based format, if that's more convenient (there's no difference in behavior).

Each upsert in this list should specify an id, and optionally specify a vector and attributes, as defined above. If vector is not provided, or has value null, the operation is considered a delete.


distance_metric stringrequired

The function used to calculate vector similarity. Possible values are cosine_distance or euclidean_squared.

Attribute keys must have consistent value types. For example, if a vector is upserted containing attribute key foo with a string value, all future vectors that specify foo must also use a string value. We're actively working on tooling to support value type migrations (and overall schema management).

Examples

Vector Update or Insert

Bulk vector operations use a column-oriented layout for vectors, ids, and attributes.

// Request payload
{
  "ids": [1, 2, 3, 4],
  "vectors": [[0.1, 0.1], [0.2, 0.2], [0.3, 0.3], [0.4, 0.4]],
  "attributes": {
    "my-string": ["one", null, "three", "four"],
    "my-uint": [12, null, 84, 39],
    "my-string-array": [["a", "b"], ["b", "d"], [], ["c"]]
  },
  "distance_metric": "cosine_distance"
}

// Response payload
{
  "status": "OK"
}

Vector Deletion

Vectors can be deleted by upserting a vector ID to null.

// Request payload
{
  "ids": [2, 3],
  "vectors": [null, null]
}

// Response payload
{
  "status": "OK"
}

Row-based API

The upsert operations can also be specified in the following format.

// Request payload
{
  "distance_metric": "cosine_distance",
  "upserts": [
    {
      "id": 1,
      "vector": [0.1, 0.1],
      "attributes": {
          "my-string": "one",
          "my-uint": 12,
          "my-string-array": ["a", "b"]
      }
    },
    {
      "id": 2,
      "vector": [0.2, 0.2],
      "attributes": {
          "my-string-array": ["b", "d"]
      }
    },
    {
      "id": 3,
      "vector": [0.3, 0.3],
      "attributes": {
          "my-string": "three",
          "my-uint": 84
      }
    },
    {
      "id": 4,
      "vector": [0.4, 0.4],
      "attributes": {
          "my-string": "four",
          "my-uint": 39,
          "my-string-array": ["c"]
      }
    }
  ]
}

// Response payload
{
  "status": "OK"
}

© 2024 turbopuffer Inc.

All rights reserved.

Privacy policyTerms of serviceContact usSystem status
Follow us on X (twitter)