Every technology has tradeoffs. This document outlines turbopuffer's key design choices to help inform your evaluation:
GET IF-NOT-MATCH
latency and
should improve as object storage technology advances. For workloads requiring
sub-10ms latency, you can enable eventual consistency. S3's
metadata p50=10ms p90=17ms, GCS's metadata p50=12-18ms p90=15-25ms (more region-dependent).turbopuffer excels at | turbopuffer may not currently be the best fit for |
---|---|
Large scale (1B+ documents/vectors) with lots of namespaces (tens of millions) | 🔜 Large namespaces (250M+) |
Naturally sharded data (e.g. B2B where each tenant's data is isolated in its own namespace) | Low scale, free tier |
Cost-effectiveness | 🔜 Aggregation (e.g. group by, sums, explore clusters, ...) |
Fast cold starts | Extensive 1st-stage ranking (we encourage generating a candidate set with hybrid search and refining/re-ranking further in your own 2nd stage) |
Reliability | Built-in 2nd-stage re-ranking (we encourage you to do it in {search.py,search.ts,..} ) |
Hybrid search (BM25 + vector search) | Built-in embedding (ditto) |
Support from DB Engineers | Open Source |
Deploy into your VPC (BYOC) | |
Heavy writes (Appends, Updates and Deletes) |
For more details, see Guarantees, Limits, and Architecture pages.