Pin high-QPS namespaces to cache

Vercel indexes its entire GTM memory on turbopuffer

Vercel's GTM engineering team built an AI lead agent that can access Gong transcripts, Slack channels, and Salesforce data. It's already saved over $2M.

$2M+

incremental revenue

10→1

SDR headcount

32x

ROI

I realized I could index Vercel's entire GTM corpus on turbopuffer with my credit card.

Drew Bredvick

Drew Bredvick, GTM Engineer

Why turbopuffer?

Vercel's GTM engineering team needed a retrieval layer to power AI agents that qualify leads, coach deals, and analyze losses across the entire account conversation history. They evaluated search providers and chose turbopuffer for three reasons:

  1. Cost: Indexing the full corpus of Gong calls, Slack messages, and emails on alternative providers would have required a finance budget approval. With turbopuffer, the team could index everything on a corporate card
  2. Hybrid search: turbopuffer does both full-text search and vector search, so the team didn't need to operate two search engines to get relevant results
  3. Cold/warm economics: GTM data is "human scale." Active accounts get fast, cached retrieval; dormant accounts fade to cold storage without accruing costs for in-memory indexing or idle capacity

Internal tool teams are a cost center by definition. I was really concerned about costs exploding as the data grew, but that hasn't been an issue with turbopuffer.

Drew Bredvick

Drew Bredvick, GTM Engineer

turbopuffer in Vercel

Vercel converts Gong calls, Slack channels (internal and external), and emails into standardized markdown representations, then chunks, embeds, and indexes them in turbopuffer. AI agents get a search tool and generate their own vector and BM25 queries at runtime as SDRs and AEs work specific leads. All customer data is encrypted at rest with AES-256.

Log

Drew Bredvick and Cameron Youngblood built Vercel's first GTM agent: a deal coaching tool for inbound sales development reps (SDRs). Their first attempt engineered context by stuffing data into prompts, but that overflowed context windows. After a few pre-processing hacks failed to scale, it became clear they needed a retrieval layer so agents could search the entire account conversation history and inject relevant context at runtime.

Drew chose turbopuffer to index Vercel's GTM corpus. The initial architecture mapped each Salesforce account to its own turbopuffer namespace for isolation:

    ┌─────────────┐
    │ Gong API    │
    │ Slack API   │
    │ Salesforce  │
    └──────┬──────┘
           │
           ▼
    ┌─────────────┐
    │   chunk +   │
    │   embed     │
    └──────┬──────┘
           ▼
╔═ turbopuffer ═══════════╗
║ ns:{sf_account_id}      ║░
║ ┌────┬───────┬─────┐    ║░
║ │ id │content│ type│    ║░
║ ├────┼───────┼─────┤    ║░
║ │c-0 │ chunk │ gong│    ║░
║ │m-42│ text  │slack│    ║░
║ └────┴───────┴─────┘    ║░
╚═════════════════════════╝░
 ░░░░░░░░░░░░░░░░░░░░░░░░░░░
    

The lead qualification agent alone allowed Vercel to reduce inbound SDR headcount by 10x and move people into outbound and account executive roles. The remaining inbound team reviews AI-qualified leads with full context from the entire account history.

In about 9 months, Vercel's GTM agents generated over $2M in incremental revenue:

  1. $2M+ in incremental revenue from the lead agent
  2. 32x ROI on the GTM engineering investment
  3. Salespeople are 10x more efficient

The namespace-per-account architecture worked well when a sales rep asked questions about a specific deal, but it couldn't answer general questions that needed to stitch results across accounts — like "what objections came up across all enterprise prospects evaluating the AI gateway?" Cameron led a refactor to a small set of shared namespaces with rich filterable attributes, so agents could query a single account or scan thematically across the entire GTM corpus.

The new architecture has three main namespaces backed by Gong webhooks that write embeddings to turbopuffer:

   ┌─────────────┐
   │ Gong API    │
   │ Slack API   │
   │ Salesforce  │
   └───────┬─────┘
           │
           ▼
   ┌─────────────┐
   │   chunk +   │
   │ embed + tag │
   └──────┬──────┘
          ▼
╔═ turbopuffer ══════════╗
║ ns: doc_summaries      ║░
║ ┌──┬───┬─────┬───┬───┐ ║░
║ │id│txt│topic│obj│act│ ║░
║ └──┴───┴─────┴───┴───┘ ║░
║                        ║░
║ ns: doc_points ◀═════╗ ║░
║ ┌──┬─────┬─────┬────┐║ ║░
║ │id│chunk│clust│acct│║ ║░
║ └──┴─────┴─────┴────┘║ ║░
║                      ║ ║░
║ ns: slack            ║ ║░
║ ┌───┬────┬────┐      ║ ║░
║ │thr│text│acct│      ║ ║░
║ └───┴────┴────┘      ║ ║░
╚══════════════════════║═╝░
 ░░░░░░░░░░░░░░░░░░░░░░║░░░
                       ║
                       ▼
            ┌─────────────┐
            │   nightly   │
            │ clustering  │
            └─────────────┘
    
  1. Document summaries: An LLM generates a structured summary of each call and tags it against a fixed taxonomy of topics, objections, products, plans, deal stages, sentiment, and stakeholders. The summary is upserted to turbopuffer with account_id, source, and extracted tags as filterable attributes so queries can target precise facets
  2. Critcal moments: Each call is also analyzed by an LLM for critical moments (e.g. a pricing objection, a feature request, a competitive comparison), and the exact quotes from these moments are embedded and upserted to a turbopuffer namespace
  3. Slack: Messages from customer and deal-related Slack channels get upserted into a single namespace with thread_id and account_id as filterable attributes, so messages can be grouped into threads or scoped to a specific account without operating one namespace per channel

A nightly HDBSCAN job clusters the critical moment embeddings and writes the cluster label back to each point as a filterable attribute. Themes that emerge from the data itself become first-class facets agents can filter on, alongside the other LLM-assigned summary tags.

Cameron built a CLI that wraps the turbopuffer API, providing a harness by which agents can make domain-specific searches to turbopuffer using known filterable attributes (account_id, topic, objection, cluster, deal stage, etc.). The CLI handles scope automatically: if a Salesforce account is in context, queries are filtered on that account, otherwise the agent searches across the entire corpus to answer thematic questions.

Now we can build queries like 'show me every objection from enterprise customers about the AI gateway' and turbopuffer just spits it out. It's made it way easier to find specifics across all our deals.

Cameron Youngblood

Cameron Youngblood, GTM Engineer

What's next

Vercel is continuing to expand its GTM corpus and build more agents:

  • Real-time objection handling: Surfacing relevant responses to reps during live calls using Zoom's RTMS API and turbopuffer queries
  • More data sources: Adding email, support tickets, and other context to the knowledge base as new sources become available

We will continue to update this log as Vercel's usage of turbopuffer evolves.