Logo

Optimizing Performance

turbopuffer is designed to be performant by default, but there are ways to optimize performance further. These suggestions aren't requirements for good performance--rather, they highlight opportunities for improvement when you have the flexibility to choose.

For example, while a single namespace with 10M documents works fine, splitting it into 10 namespaces of 1M documents each will yield better query performance if there's a natural way to group the documents.

  • Choose the region closest to your backend. We can't beat the speed of light. If there isn't a region close to us and the latency is paramount, contact us.
  • U64 or UUID IDs: The smaller the IDs, the faster the puffin'. A UUID encoded as a string is 36 bytes, wheras the UUID-native type is 16 bytes. A u64 is even smaller at 8 bytes.
  • filterable: false. For attributes you never intend to filter on, marking attributes as filterable: false will improve indexing performance and grant you a 50% discount. For large attribute values this can improve performance and cost significantly.
  • Use small namespaces. The rule of thumb is to make the namespaces as small as they can be without having to routinely query more than one at a time. If documents have significantly different schemas, it's also worth splitting them. Don't try to be too clever. Smaller namespaces will be faster to query and index.
  • Prewarm namespaces with dark queries. If your application is latency-sensitive, consider sending a query to the namespace before the user interacts with it (e.g. when they open the search or chat dialog) to start warming the cache for the namespace.
  • Smaller vectors are faster. Smaller vectors will be faster to search, e.g. 512 dimensions will be faster than 1536 dimensions. As you lose dimensions, you generally also lose precision, so you should consider the tradeoffs with your own evals and benchmarks.
  • Batch upserts. If you're upserting a lot of documents, consider batching them into fewer upserts. This will improve performance and leverages batch discounts up to 50%. Each individual upsert batch request can be a maximum of 256MB.
  • Concurrent upserts. If you're upserting a lot of documents, consider using multiple processes to upsert batches in parallel. Especially for single-threaded runtimes like Node.js or Python, this can be a significant performance boost as upserting is generally bottlenecked by serialization and compression.
  • Control include_attributes & include_vectors. The more data we have to return, the slower it will be. Make sure to only specify the attributes you need.
© 2024 turbopuffer Inc.
Privacy PolicyTerms of service