100B vectors @ 200ms p99

Guarantees

This document serves as a brief reference of turbopuffer's guarantees:

  • Durable Writes. Writes are committed to object storage upon successful return by turbopuffer's API.
  • Consistent Reads. Queries return the latest data by default but can be configured for performance at the cost of read consistency. Most updates are visible immediately since queries usually hit the writing node. Over 99.99% of queries return consistent data, however in rare cases (e.g. during scaling), reads may be briefly stale (typically limited to ~100ms, with a strict upper bound of 1 minute). The cache refreshes on every query, ensuring the latest writes appear next request.
  • Atomic Conditional Writes. Conditional writes evaluate their condition atomically with writing.
  • Atomic Batches. All writes in an upsert are applied simultaneously.
  • Any node can serve queries for any namespace. HA does not come as a cost/reliability trade-off. Our HA is the number of query nodes we run.
  • Object storage is the only stateful dependency. This means there is no separate consensus plane that needs to be maintained and scaled independently, simplifying the system's operations and thus reliability. All concurrency control is delegated to object storage.
  • Compute-Compute Separation. Query nodes handle queries and writes to object storage and the write-through cache. All expensive computation happens on separate, auto-scaled indexing nodes.
  • Smart Caching. After a cold query, data is cached on NVMe SSD and frequently accessed namespaces are stored in memory. turbopuffer does not need to load the entire namespace into cache, and then query it. The storage engine is designed to perform small, ranged reads directly to object storage for fast cold queries. Cache warming hints can eliminate user-visible cold query latency in many applications.
  • Autoscaling. Query and indexing nodes are automatically scale with demand.

Regarding ACID properties: turbopuffer provides Atomicity, Consistency, and Durability.

Isolation is not broadly applicable, as general purpose read-write transactions are not supported. However, limited read-write semantics are available through conditional writes, patch_by_filter, and delete_by_filter.

Conditional writes evaluate their reads and writes atomically. This prevents concurrency anomalies such as Lost Updates and ensures they behave as if run under Serializable isolation, the strongest isolation level.

patch_by_filter and delete_by_filter execute in two phases:

  1. They evaluate their filter against a namespace snapshot to identify matching document IDs.
  2. They atomically re-evaluate their filter against those matching IDs and modify documents that still satisfy the condition.

Re-evaluation ensures that documents are never modified that no longer satisfy the condition. However, newly qualifying documents that were inserted or updated between the two phases will be missed. As a result, patch_by_filter and delete_by_filter behave as if run under Read Committed isolation, similar to UPDATE ... SET ... WHERE ... and DELETE FROM ... WHERE ... in PostgreSQL.

For CAP theorem: turbopuffer prioritizes consistency over availability when object storage is unreachable. You can adjust this to favor availability through query configuration.

For more details, see Tradeoffs, Limits, and Architecture.

Follow
Blog