Legora searches 2B+ legal docs on turbopuffer

Legora replaced Elasticsearch and pgvector + DiskANN with turbopuffer to build an efficient retrieval layer that minimizes input tokens without sacrificing performance, all with legal-grade data isolation and security.

2B+

documents

100ms

p90 latency

98%

recall@10

turbopuffer was the only search database where we could get the combination of high security, high recall, and low latency.

Joachim Koch, Engineering Manager

Legora became a $100M ARR legal tech breakout in just 18 months by building AI that completes entire legal workflows end-to-end. They process billions of legal documents for thousands of law firms and in-house legal teams across 50+ markets.

At this scale, Legora's inference budget is supply-constrained. They compete for the finite GPU-hours each provider can allocate. Token efficiency defines the ceiling for what they can support in prod.

Joachim Koch manages Legora's documents team, responsible for extracting, parsing, chunking, embedding, and indexing every legal document for search. They ensure agents don't waste tokens reading the wrong context, while also meeting the strict data isolation requirements of large law firms and corporate legal teams.

Security and scale

Legora's search stack started on a self-hosted Elasticsearch cluster, which worked well until large firms contractually demanded per-client storage isolation. On Elasticsearch that means a separate cluster per customer, unworkable at Legora's size and growth rate.

They were already running Postgres in prod, so they migrated search onto pgvector + DiskANN on Azure, which inherited their database-per-tenant isolation pattern; they could spin a fresh Postgres database per tenant and front it with a proxy to avoid building isolation logic into application code. That solved the security problem, but introduced two fundamental performance limits they couldn't easily engineer around:

Hot partitions: pgvector forced Legora to hash (org, matter) into a limited number of partitions, producing severely hot partitions on the largest firms.
High tail latencies: Azure's DiskANN implementation became extremely memory-intensive past a certain index size, with p99 latencies spiking to 20+ seconds, which doesn't work when search sits in the critical path of almost every feature.

Why turbopuffer?

turbopuffer's object-storage-native architecture solved Legora's security and performance requirements with the same design. Object storage is the only source of truth, and query nodes are stateless caches on top. This makes turbopuffer both strictly isolated and fast at any scale.

                  ╔══Legora BYOC Region════════╗
                  ║     ┌────────────────────┐ ║░
                  ║     │   ./tpuf indexer   │────┐
                  ║     └────────────────────┘ ║░ │   
                  ║                            ║░ │   ╔══Azure Blob Storage══════════╗
                  ║     ┌────────────────────┐ ║░ │   ║ ┏━/{org_id}/{namespace}━━━━┓ ║░
                  ║     │    ./tpuf query    │ ║░ │   ║ ┃ ┏━/wal━━━━━━━━━━━━━━━━━┓ ┃ ║░
                  ║     │┌──Memory Cache────┐│ ║░ └──▶║ ┃ ┃■■■■■■■■■■■■■■■◈◈◈◈   ┃ ┃ ║░
                  ║   ┌▶││■■■■■■■■■■        ││ ║░     ║ ┃ ┗━━━━━━━━━━━━━━━━━━━━━━┛ ┃ ║░
               ┌──╩─┐ │ │└──────────────────┘│ ║░     ║ ┃ ┏━/index━━━━━━━━━━━━━━━┓ ┃ ║░
╔══════════╗   │    │ │ └────────────────────┘ ║░ ┌──▶║ ┃ ┃■■■■■■■■■■■■■■■       ┃ ┃ ║░
║  Client  ║──▶│ LB │─┤ ┌────────────────────┐ ║░ │   ║ ┃ ┗━━━━━━━━━━━━━━━━━━━━━━┛ ┃ ║░
╚══════════╝░  │    │ │ │    ./tpuf query    │ ║░ │   ║ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ║░
 ░░░░░░░░░░░░  └──╦─┘ │ │┌──Memory Cache────┐│ ║░ │   ╚══════════════════════════════╝░
                  ║   └▶││■■■■■■■■■■■■■■■   ││────┘    ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
                  ║     │└──────────────────┘│ ║░     
                  ║     └────────────────────┘ ║░     
                  ╚════════════════════════════╝░
                   ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░

                  ╔══Legora BYOC Region════════╗
                  ║     ┌────────────────────┐ ║░
                  ║     │   ./tpuf indexer   │────┐
                  ║     └────────────────────┘ ║░ │   
                  ║                            ║░ │   ╔══Azure Blob Storage══════════╗
                  ║     ┌────────────────────┐ ║░ │   ║ ┏━/{org_id}/{namespace}━━━━┓ ║░
                  ║     │    ./tpuf query    │ ║░ │   ║ ┃ ┏━/wal━━━━━━━━━━━━━━━━━┓ ┃ ║░
                  ║     │┌──Memory Cache────┐│ ║░ └──▶║ ┃ ┃■■■■■■■■■■■■■■■◈◈◈◈   ┃ ┃ ║░
                  ║   ┌▶││■■■■■■■■■■        ││ ║░     ║ ┃ ┗━━━━━━━━━━━━━━━━━━━━━━┛ ┃ ║░
               ┌──╩─┐ │ │└──────────────────┘│ ║░     ║ ┃ ┏━/index━━━━━━━━━━━━━━━┓ ┃ ║░
╔══════════╗   │    │ │ └────────────────────┘ ║░ ┌──▶║ ┃ ┃■■■■■■■■■■■■■■■       ┃ ┃ ║░
║  Client  ║──▶│ LB │─┤ ┌────────────────────┐ ║░ │   ║ ┃ ┗━━━━━━━━━━━━━━━━━━━━━━┛ ┃ ║░
╚══════════╝░  │    │ │ │    ./tpuf query    │ ║░ │   ║ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ║░
 ░░░░░░░░░░░░  └──╦─┘ │ │┌──Memory Cache────┐│ ║░ │   ╚══════════════════════════════╝░
                  ║   └▶││■■■■■■■■■■■■■■■   ││────┘    ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
                  ║     │└──────────────────┘│ ║░     
                  ║     └────────────────────┘ ║░     
                  ╚════════════════════════════╝░
                   ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░

Isolation at the storage layer

turbopuffer runs on BYOC clusters inside Legora's own Azure regions, and no customer data leaves the VPC. Each legal matter (a term that lawyers use for a project) gets its own turbopuffer namespace, a separate prefix on Azure Blob Storage that is physically isolated by default. Each namespace can be secured with customer-managed encryption keys (CMEK) using existing Azure Blob Storage KMS primitives, so Legora's customers can manage, rotate, and revoke their own keys.

Isolation at the compute layer

Legora's largest customers have strict DPAs that treat the contents of local disks as "data at rest", so anything cached on disk has to be encrypted and isolated. turbopuffer's default cache hierarchy puffs active data onto an NVMe SSD + memory cache across tenants on shared query nodes, which doesn't fit that contract.

turbopuffer engineers worked with Joachim's team to deploy a custom, RAM-only cache on Legora's Azure clusters. Object storage remains the source of truth, cached state lives only in volatile memory, and no unencrypted client data ever lands on local disk.

Performance

turbopuffer's query tier is stateless, and any query node can serve queries for any namespace, so hot matters scale without building memory pressure on a single node. Legora runs 10M+ namespaces today, each able to hold 100s of millions of documents, without the hot partitions of pgvector or the many clusters of Elasticsearch.

Query plans are object-storage-aware, minimizing round trips so even a fully cold query is fast (p99 ~2s) relative to the tail latencies they saw on Postgres. Combining fast cold queries with a stateless query tier, Legora achieves consistent performance across all namespaces regardless of size:

99.5% of queries are hot
800ms p90 on cold queries
100ms p90 on hot queries
10ms p50 on all queries

All queries are strongly consistent by default, so recently uploaded legal documents are immediately available in search results.

We still have very skewed namespace sizes, but even on the largest matters we just don't observe the tail latency spikes on turbopuffer.

Joachim Koch, Engineering Manager

Results

With turbopuffer, Legora hits the trifecta of legal-grade security, latency, and recall they couldn't get with their other search engines:

Physical data isolation with customer-managed encryption through namespace-per-matter design + CMEK
98% avg recall@10 across all namespaces
10x lower p99 latency than Postgres (20s → 2s)

turbopuffer in Legora

Legora ingests legal documents (PDFs, .docx files) directly out of law firms' document management systems and virtual data rooms. Documents go through OCR with bounding-box extraction so Legora can provide precise citation highlighting back to the source coordinates, not just clean text for the LLM. Documents are then chunked, embedded, and ingested into one turbopuffer namespace per legal matter with vector, BM25, and metadata indexes for filtered hybrid search.

Alongside per-matter namespaces, Legora indexes firm-wide documents (memos, seminal cases, partner notes) into a separate set of turbopuffer namespaces. This grounds agent answers across all matters in the firm's best practices.

What's next

Joachim and the documents team are leaning into turbopuffer as the search infrastructure to handle 100B+ legal documents:

More inside each matter: Legora runs OCR on GPUs, which constrains indexing throughput. They're evaluating lightweight, just-in-time parsing on CPUs with cheaper models to eliminate a parsing bottleneck. They're also evaluating multimodal embeddings to index multiple media types per document.
Cross-matter, ethical enterprise search: A single law firm may represent clients with conflicting interests; lawyers staffed on one matter must not be able to see the other's documents under any circumstance. Legora is building this as an agentic flow: first filter to a small candidate set of matters by metadata, then issue scoped semantic + keyword searches per namespace to maintain the "ethical wall" on the security boundary enforced at the namespace level.
Public legal research corpus: A shared, small set of very large namespaces (>=10B docs) for public legal sources queryable by all Legora tenants, using pinned namespace for predictable pricing and performance at the high QPS.

We will continue to update this log as Legora's usage of turbopuffer evolves.