Every technology has tradeoffs. This document outlines turbopuffer's key design choices to help inform your evaluation:
GET IF-NOT-MATCH latency and
should improve as object storage technology advances. For workloads requiring
sub-10ms latency, you can enable eventual consistency. S3's
metadata p50=10ms p90=17ms, GCS's metadata p50=12-18ms p90=15-25ms (more region-dependent).| turbopuffer excels at | turbopuffer may not currently be the best fit for |
|---|---|
| Large scale (100B+ documents/vectors) with lots of namespaces (tens of millions) | Low scale, free tier |
| Naturally sharded data (e.g. B2B where each tenant's data is isolated in its own namespace) | Extensive 1st-stage ranking (we encourage generating a candidate set with hybrid search and refining/re-ranking further in your own 2nd stage) |
| Cost-effectiveness | Built-in 2nd-stage re-ranking (we encourage you to do it in {search.py,search.ts,..}) |
| Fast cold starts | Built-in embedding (this is a few lines of code at most) |
| Reliability | Open Source |
| Hybrid search (BM25 + vector search) | |
| Support from DB Engineers | |
| Deploy into your VPC (BYOC) | |
| Heavy writes (Appends, Updates and Deletes) |
For more details, see Guarantees, Limits, and Architecture pages.