Query prices reduced by up to 94%

Namespace Pinning

                   ╔══turpuffer region═══════╗      ╔═══Object Storage═════════════════╗
                   ║      ┌────────────────┐ ║░     ║ ┏━━Indexing Queue━━━━━━━━━━━━━━┓ ║░
                   ║   ┌─▶│  ./tpuf query  │ ║░     ║ ┃■■■■■■■■■                     ┃ ║░
                ┌──╩─┐ │  └────────────────┘ ║░     ║ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ║░
╔══════════╗    │    │ │  ┌────────────────┐ ║░     ║ ┏━/{org_id}/{namespace}━━━━━━━━┓ ║░
║  Client  ║───▶│ LB │─┼─▶│  ./tpuf query  │─╬─────▶║ ┃ ┏━/wal━━━━━━━━━━━━━━━━━━━━━┓ ┃ ║░
╚══════════╝░   │    │ │  └────────────────┘ ║░     ║ ┃ ┃■■■■■■■■■■■■■■■◈◈◈◈       ┃ ┃ ║░
 ░░░░░░░░░░░░   └──╦─┘ │  ┌────────────────┐ ║░     ║ ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ┃ ║░
                   ║   │─▶│  ./tpuf query  │ ║░     ║ ┃ ┏━/index━━━━━━━━━━━━━━━━━━━┓ ┃ ║░
                   ║   │  │ [pin:org1/nsA] │ ║░     ║ ┃ ┃■■■■■■■■■■■■■■■           ┃ ┃ ║░
                   ║   │  └────────────────┘ ║░     ║ ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ┃ ║░
                   ║   │  ┌────────────────┐ ║░     ║ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ║░
                   ║   └─▶│  ./tpuf query  │ ║░     ╚══════════════════════════════════╝░
                   ║      │ [pin:org2/nsY] │ ║░      ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
                   ║      └────────────────┘ ║░
                   ╚═════════════════════════╝░
                    ░░░░░░░░░░░░░░░░░░░░░░░░░░░
    

By default, namespaces run on turbopuffer's shared, multi-tenant query infrastructure. That is the right choice for most workloads.

Namespace pinning reserves compute and NVMe SSD cache for a specific namespace. Once pinned, that namespace's queries run on those reserved resources, so its data stays warm, cost and performance stay predictable, and sustained high query volume is often much cheaper.

Multi-tenant vs. pinned

Multi-tenant (default): Shared compute and cache. This is the simplest and most cost-effective option for most namespaces. Smart caching adapts to traffic patterns to minimize query latency. For spiky traffic, you can warm the cache to reduce cold starts.

Pinned: Reserved compute and NVMe SSD cache for one namespace. This keeps the hot path warm and gives large, busy namespaces more predictable throughput, latency, and cost.

Pinning also changes billing: instead of per-query (TB Queried) pricing, pinned namespaces are billed in GB-hours based on namespace size and how long the namespace stays pinned. That means the effective cost per query goes down as query volume goes up, with a break-even point typically around 10 queries per second.

When to use pinning

Pinning is a good fit when:

  • You run sustained high query volume on a namespace, where GB-hours are cheaper than paying per query.
  • You want predictable query latency on a namespace, where occasional cold queries would hurt your product.
  • You want predictable cost on a namespace, where GB-hours are easier to forecast than per-query TB Queried.

For many small or naturally sharded namespaces, the default multi-tenant path is still the best choice.

As a rule of thumb, pinning is worth evaluating when a large (>16 GB) namespace sustains above 10 queries per second and you want more predictable cost and performance.

How it works

  1. You turn pinning on for an existing namespace via the metadata API.
  2. turbopuffer loads the namespace into the SSD cache of reserved query nodes.
  3. All queries for that namespace route to those nodes.
  4. The namespace stays warm on those reserved SSDs for as long as it stays pinned.

Pinning usually takes less than 30 minutes. During that time, queries continue to work on the existing path with no downtime.

Pinning does not change durability: data remains stored durably in object storage whether pinned or not.

Replicas

Replicas increase read throughput.

Each replica runs on its own reserved query node, and reads are load-balanced across them. Throughput scales linearly with replica count.

A single replica can handle between 100 and 1000 QPS, depending on query shape, filters, and namespace size. Filtered vector and full-text search queries fall in the middle of this range.

To decide when to add replicas, monitor pinning.status.utilization on GET /v1/namespaces/:namespace/metadata (see Metadata). If utilization stays high or if queries return HTTP 429 (Too Many Requests) errors, add more replicas to increase read capacity.

Replicas do not currently autoscale for a pinned namespace. This is planned.

You do not need multiple replicas, but they also improve fault tolerance. If a replica hits a hardware failure, turbopuffer will fail over and rewarm a new replica. With multiple replicas, this is less likely to result in cold query latency reaching your application.

Pricing

Pinned namespaces are billed by GB-hours:

namespace size (GB) × replicas × hours pinned

Queries served by a pinned namespace are not subject to usage-based pricing (TB Queried). At sustained query volume, this often makes individual queries much cheaper than per-query pricing. Exact break-even depends on your workload.

Billing has a floor of 64 GB and 10 minutes. A pinned namespace smaller than 64 GB is billed as 64 GB, and a namespace pinned for less than 10 minutes is billed for 10 minutes.

Configuration

Configure pinning with PATCH /v1/namespaces/:namespace/metadata (see Metadata).

Set pinning to true to pin with default settings and 1 replica.

Set pinning to null to unpin.

Inspect current settings with GET /v1/namespaces/:namespace/metadata.

Example: Enable pinning with 2 replicas

Example: Disable pinning