turbopuffer is designed to be performant by default, but there are ways to
optimize performance further. These suggestions aren't requirements for good
performance--rather, they highlight opportunities for improvement when you have
the flexibility to choose.
For example, while a single namespace with 10M documents works fine, splitting
it into 10 namespaces of 1M documents each will yield better query performance
if there's a natural way to group the documents.
- Choose the region closest to your backend. We can't beat the
speed of light. If there isn't a region close to us and the latency is
paramount, contact us.
- U64 or UUID IDs: The smaller the IDs, the faster the puffin'. A UUID
encoded as a string is 36 bytes, wheras the UUID-native type is 16
bytes. A u64 is even smaller at 8 bytes.
- filterable: false. For attributes you never intend to filter on, marking
attributes as filterable: false will improve indexing performance and
grant you a 50% discount. For large attribute values this can improve
performance and cost significantly.
- Use small namespaces. The rule of thumb is to make the namespaces as
small as they can be without having to routinely query more than one at a time.
If documents have significantly different schemas, it's also worth splitting
them. Don't try to be too clever. Smaller namespaces will be faster to query and index.
- Prewarm namespaces with dark queries. If your application is
latency-sensitive, consider sending a query to the namespace before the user
interacts with it (e.g. when they open the search or chat dialog) to start
warming the cache for the namespace.
- Smaller vectors are faster. Smaller vectors will be faster to search, e.g.
512 dimensions will be faster than 1536 dimensions. As you lose dimensions, you
generally also lose precision, so you should consider the tradeoffs with your
own evals and benchmarks.
- Batch upserts. If you're upserting a lot of documents, consider batching
them into fewer upserts. This will improve performance and leverages batch
discounts up to 50%. Each individual upsert batch request can be a
maximum of 256MB.
- Concurrent upserts. If you're upserting a lot of documents, consider using
multiple processes to upsert batches in parallel. Especially for single-threaded
runtimes like Node.js or Python, this can be a significant performance boost as
upserting is generally bottlenecked by serialization and compression.
- Control include_attributes & include_vectors. The more data we have to
return, the slower it will be. Make sure to only specify the attributes you
need.