Logo

Configuration

turbopuffer is configurable by modifying a Kubernetes ConfigMap in the turbopuffer namespace of your deployment.

The turbopuffer team works with you to manage your deployment, e.g. propose ConfigMap changes to your cluster, e.g. tuning cache sizes, LSM settings, or recall.

To update the ConfigMap, you can either use the Helm chart with the values.yaml you maintain for the cluster:

Change values.yaml in the onprem-kit/tpuf directory and run helm upgrade --install --values=values.yaml turbopuffer tpuf.

Or, you can update the ConfigMap directly with kubectl edit -n turbopuffer configmap turbopuffer-settings.

After updating the ConfigMap via Helm or manually, you must restart tpuf to apply the changes:

kubectl rollout restart sts/turbopuffer-index
kubectl rollout restart sts/turbopuffer-query

authentication.allowed_api_keys_sha256 object

A mapping of org ids to API keys. Each API key is expected by be a 44 character base 64 encoded SHA-256 key.

  1. Generate an API key. This is the API key you'll put in your client secrets. You can generate this however you like, for example: openssl rand -hex 32.
  2. SHA-256 and base64 encode the API key. This is the key you'll put in the tpuf config. echo -n KEY | openssl dgst -sha256 -binary | base64.
[authentication.allowed_api_keys_sha256]
5X8OlKguH1l2jvTJrPgnvlcM = [ # Org ID from the dashboard
  "IaG0JUcIiCXKwqhIWH8Qr0incF2xsbRZRRJJxznl0GM=" # SHA-256 + Base64 API key
]
cache.memory_budget_bytes_by_priority array

The absolute amount of memory used for caching. We recommend this to be set to node_memory - 64GB to allow headroom for writes and other operations. We aim to reduce headroom requirements.

[cache]
memory_budget_bytes_by_priority = [100_000_000_000] # Query default for 176GB machines
memory_budget_bytes_by_priority = [100_000_000_000] # Index default for 176GB machines
fairness.query_concurrency_per_namespace number

Maximum concurrent queries to a single namespace allowed. This protects the node against a single namespace being overloaded. 429s will be returned from queries if there is not enough capacity to handle them.

[fairness]
query_concurrency_per_namespace = 16 # default
fairness.query_bulkhead_wait_ms number

Maximum milliseconds to wait if the query concurrency limit is reached.

[fairness]
query_bulkhead_wait_ms = 800 # default
cache.prewarm.keep_warm_orgs object

A set of org_ids to keep warm in cache. On node startup, machines will prewarm namespaces for these orgs to ensure their cache is hot.

Not recommended for most users.

[cache.prewarm]
keep_warm_orgs = [
  '<premium-users-org>',
  '<no-cold-starts-pls-org>',
]
cache.disk_budget_bytes number

The absolute number of bytes or percentage of local SSD capacity to use as a cache.

Not recommended changing for most users.

[cache]
disk_budget_bytes = 0.985 # default, leaving room for ext4 reserved blocks
indexing.cache_fill_concurrency number

Number of cache fills to allow concurrently in the background per node. These are fired after a a cold query.

We prioritize cache fills for more important files (i.e. to get faster queries sooner), e.g. centroids.

[indexing]
cache_fill_concurrency = 2 # default
indexing.reindex_unindexed_documents_min number

Require a minimum of this many unindexed documents within a namespace to trigger a reindex. Prevents excessive indexing in the presence of few writes.

[indexing]
reindex_unindexed_documents_min = 5000 # default
indexing.reindex_unindexed_documents_max number

The maximum number of documents we'll allow to remain unindexed. If the namespace has at least this many unindexed documents, a /index call will always trigger an index operation.

[indexing]
reindex_unindexed_documents_max = 50_000 # default
indexing.unindexed_documents_ratio number

The ratio of unindexed:indexed documents to index at. For example, 0.1 means we should index if 10% of the namespace is unindexed. Further constrained by both reindex_unindexed_documents_min and reindex_unindexed_documents_max.

[indexing]
unindexed_documents_ratio = 0.1 # 10%, default
indexing.batch_size_bytes number

During indexing, the number of document bytes to process at a given time before flushing. An indexing run can be composed of multiple batches, where we flush our progress incrementally after each bach.

[indexing]
batch_size_bytes = 1_000_000_000 # 1 GB, default
tracing.otlp_endpoint string

The OTLP endpoint to emit traces to, if any. Should end with /v1/traces. If empty, traces won't be emitted.

[tracing]
otlp_endpoint = "http://localhost:4318/v1/traces"
stats.host string

The host of the statsd endpoint in use, if any. If set, metrics will be emitted to this endpoint.

[stats]
host = "" # defaults to none
stats.port string

The port of the statsd endpoint to use. Ignored if stats.host is empty.

[stats]
port = "" # defaults to none
blob.max_concurrent_requests number

The maximum number of concurrent requests in flight to object storage at one given time.

[blob]
max_concurrent_requests = 10_000 # default for query nodes
max_concurrent_requests = 20_000 # default for indexing nodes
storage.lsm_ttl_seconds number

The amount of time data can live in the LSM tree before being force-compacted.

This setting serves two purposes:

  • Compaction speeds up queries. By compacting more frequently, queries will be more efficient.
  • For compliance, i.e. if a customer requires that deletes (via the API) are properly deleted within X days, setting this to a value < X days will ensure that the index doesn't still contain any residual data from the deleted documents.
[storage]
lsm_ttl_seconds = 1728000 # 20 days, default
© 2024 turbopuffer Inc.
Privacy PolicyTerms of service