How to build 10x cheaper with object storage

August 05, 2025•Barrchives Podcast

Transcript

Simon Eskildsen [0:00]:
Trade-offs are very important in a database. That's the fundamental thing with databases and why some databases are good at things that other databases are not as good at.

Barr Yaron [0:07]:
Welcome back to Barrchives. I'm your host, Barr Yaron, and today I'm excited to have Simon Eskildsen, co-founder and CEO of turbopuffer, joining us. And turbopuffer powers search for fast-growing startups like Cursor, Notion, and Linear.

Simon Eskildsen [0:22]:
The fundamental trade-off for turbopuffer is that writes are slow, right? We have to commit them directly to object storage, so it takes hundreds of milliseconds. You're not going to do an OLTP workload on something like that; it just doesn't make sense. You can't do checkouts for e-commerce on something like that unless you get it all down in a single transaction. And then once in a while, you'll hit a node that doesn't have the data on disk, and you'll get a cold query, and it will be a couple hundred milliseconds instead of tens of milliseconds. For search, that's a perfectly acceptable trade-off. For high-frequency trading, maybe not so much. We happen to think that this set of trade-offs are pretty phenomenal for a lot of workloads, especially search.

Barr Yaron [1:06]:
Hey, Simon. I'm so excited to have you on today. It's sort of insane to me that you started turbopuffer in 2023, and it's become such a beloved product. You're running hundreds of billions of vectors. Top customers are building on top of turbopuffer, so I'm excited to get into it today.

Simon Eskildsen [1:23]:
Thank you so much for having me, Barr.

Barr Yaron [1:25]:
Okay, well, let's start with your aha realization around turbopuffer. You were a principal engineer working on infrastructure at Shopify, but then you had your period where you consulted with startups, helping them with their infrastructure and scalability issues. So when in this process of working with companies did you realize maybe there is a need for turbopuffer? It's time to build this thing? How did that happen?

Simon Eskildsen [1:48]:
Yeah, I think maybe the full background is useful here. So yeah, as you said, I spent almost a decade working on infrastructure at Shopify. It was kind of a ragtag team of software developers who learned the infrastructure as we went, along with the operations people, to just make sure that this Rails app continued to scale as the company did. When I joined, we were doing a couple hundred requests per second, and when I left in 2021, we had peaks of more than a million requests per second. The hardest thing to scale through all of that is the data layer. So as part of that, my co-founder Barr Yaron and I spent thousands, tens of thousands of hours—well, probably 10,000 hours at this point—scaling every single part of the data layer of Shopify: MySQL, Redis, Memcached, Elasticsearch, like all of these things and proxies in front of it. I have some experience running these machine-based solutions at scale, and they're really good for the kind of e-commerce search that we needed, where it's very important to search almost all of the data a lot of the time. But it always seemed to me that there might be a better way to do search. It didn't occur to me until I spent a couple of years bopping around, helping my friends' companies in small increments with their infrastructure challenges, what the future of search might look like. One of the companies, my friend's company Readwise, asked me if I could build a small recommendation engine after I was done spending a bunch of time tuning their Postgres auto vacuum, which is like the most common scaling challenge I think in the 2020s. We wanted to build a recommendation engine, and I thought that vectors just looked amazing because one of the search problems that we talked a lot about at Shopify was sort of mapping the vocabulary of the user with the vocabulary of the store. So you search for "red dress," and they have a "burgundy skirt," and you search for "shoe," and they have some lime green sneaker, and it just doesn't come up, right? Because you're searching for strings, not for things.

Barr Yaron [4:05]:
The words are not...

Simon Eskildsen [4:07]:
Exactly. You're searching for strings, not for things. And so you've got to turn strings into things. Vectors are really good at that, right? You chop the head off of the LLM, and out come these numbers that you can plot in a very large coordinate system, and things that are adjacent in that coordinate system are also adjacent in the real world. The LLMs were wonderful for that. So for my friends at Readwise, we made a small recommendation engine that did this, and it was pretty good. Without much tuning, it actually did an okay job recommending articles, and it makes sense because it's trained on articles on the web, so it was very good at it. But when I ran a napkin math on how much this was going to cost, it was close to maybe $30,000 to $40,000 on the reputable vector database at the time. It made sense; it was new, but it turned out that all of this was stored in memory. At Readwise, the founders put this in the bucket of, "Well, that seems really neat, but it just costs too much, so we're going to wait for the cost to go down as token costs have also come down." I couldn't stop thinking about it. I was like, "Why has no one built a database that takes advantage of the things that we have available to us now?" Because it seems like the perfect trade-off for search. We have NVMe SSDs; they're about 100 times cheaper than RAM, but the memory bandwidth is only maybe about 5 to 10 times lower. We have S3 that is finally consistent, which is a very nice property to have when you're building a database, which happened in late 2020. And then we also have compare and swap on object storage, which means that we can now build a database that, like a pufferfish, inflates into the memory hierarchies from object storage into NVMe and finally into memory, with the only downside being that all the writes have a higher write latency. We thought that this was a perfect set of trade-offs for search. So that is the long story of how I went from when it's building to recovery.

Barr Yaron [6:03]:
I mean, that's very helpful. And I like that, you know, for someone who spends a lot of time thinking about search, I'm surprised I haven't come across "things, not strings," but it's a good tagline. I mean, it makes sense. You have these new capabilities, like you mentioned, NVMe SSDs in the cloud, consistent S3, and compare and swap. And then you have what you had mentioned, which is sort of this new workload of LLMs. What are some of the other things that made this the right point in time, whether it was workload, data type, or other capabilities?

Simon Eskildsen [6:34]:
Yeah, so I think if you want to build a new database, you need two things. You need a new workload, and you need a new storage architecture. The new workloads seem to be that there's an enormous amount of data that wants to be connected to LLMs. The models are just hungry for more data, and they're hungry at reasoning with the data. But in order to do that at the scale that was required, we also needed to change the economics of the storage. If you take all of the data and store it in memory as vectors, then these vectors are 10, maybe 100 times larger than the dataset itself because if you have a kilobyte of text, that turns into usually tens of kilobytes of vectors. So the need for a new storage architecture is even more prominent there in terms of just the pure costs. If you store a gigabyte of data on a disk, it costs you about $0.20, so $0.10 for the gig, for $0.10 per gigabyte of disk, and then you run them at around 50% disk utilization. Then you replicate it three ways, so you end up paying about $0.60 all in per gigabyte of data that you store versus the $0.02 per gigabyte when you store it in object storage. In case you're accessing it a lot, you only have to replicate it to a single machine, which can cost you maybe somewhere around $0.05 per gigabyte. So all in, you're like an order of magnitude cheaper than the base cost of replicating this on disk. So the two things you need for a new database are a new workload, which is connecting lots of data to LLMs, and the second thing you need is a new storage architecture, the new storage architecture being that object stores are a source of truth, and we just cache the data aggressively that needs to be accessed a lot.

Barr Yaron [8:17]:
Let's actually walk through, okay, you had this realization when you were helping Readwise. You built, you eventually built turbopuffer. What goes into the simplest version of turbopuffer, the initial version that you put in front of a first customer? I know you've done a lot of optimizations and work since then. So architecturally, what goes into it, and what does it take to build that?

Simon Eskildsen [8:39]:
It takes a lot of embarrassment, I would say, to put out that version. I think there's a lot of these startup platitudes that you hear and that you don't fully internalize until you're in it. The play-by-play here was that it was March of 2023, and I had this idea, and I was talking through it with a friend. He really encouraged me to just go for it, and so I started thinking about it. I was a really good friend; it's actually the friend who since designed the website, which is now a big part of our identity.

Barr Yaron [9:10]:
Oh.

Simon Eskildsen [9:10]:
I sat down and started working, learning everything I could about all the different vector indexing algorithms and then reasoning through which ones would work on object storage and which one wouldn't. I spent a summer up by a cabin in Canada and just completely focused on this first version of the database. It ended up being the simplest thing that I could possibly ship that had acceptable performance.

Barr Yaron [9:43]:
How do you define acceptable performance?

Simon Eskildsen [9:45]:
Acceptable performance is like a hot query around 100 milliseconds seemed good enough to ship, and a cold query around one to two seconds seemed good enough to me to ship with the economics that we had. There was no reason to me why this couldn't be as fast as the fastest one out there that was in memory, just with better economics, so you could really get all that benefits out of the gate. The first version of turbopuffer was literally just a file that was called centroids that was on object storage. You download the file of the centroids, and then you search through them all, and then every cluster in the vector index were in other files that were then downloaded into the second round trip into the process. That was it. I could go into more details on what that exactly means, but it was very simple. It was just two round trips back and forth to object storage. At the time, there was not even an SSD cache; I just put a caching engine in front of it. I ran the entire thing in a TMUX session on a single node in prod. It was literally the simplest possible thing that I could come up with that I could ship after that summer of running an inordinate amount of experiments on figuring out how to make all of this fast because it's not quite as simple as I put it to do the indexing and everything in a way that has high recall.

Barr Yaron [11:12]:
What's the biggest challenge there in version one?

Simon Eskildsen [11:15]:
The biggest challenge was that it wasn't clear what indexing algorithm you wanted to use. So at a high level, when you're building a vector index, you sort of have three options. The first option is the simplest one. It's like when you have a query vector, you compare it to every single vector in the target dataset, and you return the top k closest ones.

Barr Yaron [11:36]:
And then you have to look through everything.

Simon Eskildsen [11:36]:
You have to look through everything, and it works for, you know, if you have about a gigabyte of vectors, you can read that at maybe 10 to 20 gigabytes per second if you max out the machine, so you can do maybe 10 requests per second if you exhaust the machine. Latency will be around 100 milliseconds; it sort of works, but as you get into larger and larger sizes and more queries per second, it sort of starts falling apart. The second option is to use a graph-based index, and this is a lot of the rage. All of the existing productionized implementations were using this algorithm called HNSW. HNSW is essentially you can sort of, with heuristics, construct a graph where vectors that are adjacent in vector space are also connected in the graph. The problem with this approach is that if you store the data on object storage, every time you navigate a node in the graph, you have to go to object storage. The p90 to object storage is maybe 200 to 300 milliseconds. So every time you navigate, you start at the center, 200 milliseconds; you go one out, 200 milliseconds, 200 milliseconds as you navigate. This is really fast in memory because you only need to do maybe nine to 10 reads to go get all the closest vectors, but it is extremely slow on object storage. Many, many queries, even on disk, are slow because disks are not good at a lot of reads. We're doing very low bandwidth per read. HNSW is phenomenal because you just insert vectors, and they just go into the graph, and it works great. It's the economics that are difficult. If you're storing a billion vectors in an HNSW graph, you kind of have to store the whole thing in memory or maybe some of it to disk. It gets very complicated very quickly, and the costs become astronomical—tens of thousands, maybe even hundreds of thousands of dollars to store a billion vectors, which you can do at a thousand dollars with turbopuffer. So these orders of magnitudes of improvements in storage really come out, but this poses a challenge because the reason why HNSW is so popular is because it has very high recall, very high accuracy against the exhaustive search, and it's very easy to maintain. So that's why it was so popular. The third approach is actually, I think, almost the most obvious one if you just sat down and drew a bunch of vectors in a coordinate system, which is that if you draw a coordinate system, you imagine it's in 2D, then naturally... Actually, clusters will occur. If we go back to the e-commerce example, you can imagine some of the vectors that talk about dresses and skirts are in one cluster, some that talk about shoes are in another cluster, and some that talk about pants are in the third cluster.

Barr Yaron [14:14]:
On the assignment, I would greatly separate the dresses and skirts. I don't agree with that example, but yes, I see. Directionally, I see what you're saying.

Simon Eskildsen [14:21]:
Well, the skirts and the dresses are like adjacent-ish, right? But still. So there's probably clothing items that I don't know the name of that you would know the name of that are right in between—a romper maybe—and but the shoe cluster we can say is a little bit further away. But either way, the idea that you do that then is that there might be three natural clusters here, and so you take the centroids of those clusters. The centroid space is just the average of all of the members; it's like an artificial vector. It doesn't make sense really to take the average of, you know, a romper and pants and dresses, but that forms a centroid. Now, instead of having, say, 100 vectors, you have three vectors, one for each cluster. When you do the search, you just look at what is the most adjacent centroid to my query vector, and then you download only all of the vectors that belong to that cluster with that centroid. This is the most old-school way of doing vector search. You run a big clustering algorithm over the entire thing, and you return the most adjacent clusters, and you search those clusters exhaustively, basically. It works fine. It's not as fast as HNSW unless you're very careful about how you construct the clusters. But constructing the optimum clusters is essentially an NP-complete problem; it takes an enormous amount of time. So there's a lot of heuristics that go into it, just like the graph. But it works really well for disk, and it works really well for object storage because whether you're downloading 100 MB or 1 MB from object storage, there's just not a big difference. On disk, there's also not a huge difference. Of course, there is a difference, but it's not the same kind of difference as doing a lot of random searches, right? You can do a lot of this in a round trip with just a small extra penalty. So it works really well for disk because you just get the centroids, and then you get the clusters that match. You go to object storage, and it's the same thing. For memory, you can get away with a lot of random reads into a graph in the time span that you can read all of that memory. For me, figuring out and really moving myself away from the status quo—that everything should be a graph, and that to make graphs work on disk you just shrink them so they do less graph search (disk kNN–style ideas)—took a long time, because there was nothing really. Everything seemed to be trending towards the direction of graphs.

Barr Yaron [16:42]:
Yep. Yep. And so the most difficult part was actually making that fundamental architectural decision and making sure that it's the decision point and not the implementation of it.

Simon Eskildsen [16:52]:
I think it was, yeah, getting high recall on that kind of solution with something that worked performantly on object storage. I had some false starts. I started by using a Cloudflare Worker and doing it there and had to move to servers, had to build a small storage engine. I tried a bunch of different ways of making the index be online updatable so you didn't have to retrain the whole index every time you did enough writes. That took some time, building a simple imitation of the WAL. There was just a lot of—I probably did three or four rewrites before I shipped the simplest thing over that summer.

Barr Yaron [17:27]:
And then from shipping the simplest thing into getting it into the hands of your first customer, what does that look like?

Simon Eskildsen [17:33]:
Yeah, so when I launched it, I was kind of exhausted from having worked the whole summer on it, and I launched it in the beginning of October in 2023. I got a nice email from one of the Cursor co-founders. This is back when Cursor was a smaller team. Knowing that team so well now, I can imagine that they sat around the dinner table and said, "Oh, like these vectors are so large, and the query profile that we need for doing retrieval over a code base just matches so well that we can hydrate it into a cache when we actually query it. Only a percentage are active." They would have just come up with this. I don't know if they did or not, but either way, it slotted right into how they thought about how this problem should be solved with the right set of trade-offs for them. Graphs are great if you're searching like a billion products all the time and you're eBay or Shopify, but for something like Cursor, where so much of the data is inactive, this architecture made a lot of sense to them. So they reached out, and they just sent like 10 bullet points with a bunch of numbers—like what kind of cost they were running up against right now, why it didn't match their unit economics, what kind of load they had, what kind of features they would need. We just went back and forth a bit on bullet points. Cursor was growing really well in 2023, but it was not as big as it is now. I felt like I needed to go meet this team in person. I had the instinct that I just needed to fly to San Francisco but not make them feel bad about it. So I just said that I was going to be in San Francisco on Monday.

Barr Yaron [19:14]:
It's the classic move.

Simon Eskildsen [19:17]:
I didn't know at the time, but I went to their office, and we had some long discussions. I spent a bunch of time helping them also with their Postgres. I mean, they were growing a lot of the time, and they were a very, very small team. We spent a lot of time talking about their Postgres and how to tune auto vacuum, coming back to that. Then I told them how turbopuffer worked, where we were going with it, and we decided to partner. They moved all of their load over the coming weeks after that to turbopuffer back in 2023. By moving them to this new storage architecture with this new set of trade-offs, they were able to reduce their storage costs or their vector costs by 20x or 95%, which just matched their user economics a lot better.

Barr Yaron [20:08]:
Cursor, first of all, is a phenomenal first anchor customer, and they've grown tremendously. Also, their use case makes a lot of sense, right? Historically, customers have large vector indices with very high usage; only a fraction for Cursor need to be queryable at any point in time. They only need the index in memory for the period the user is actively querying the code base. It makes a lot of sense. When you thought about initial early customers once you had it in the hands of the first one, how did you think about the trade-offs of kind of like who turbopuffer is not the best fit for, where turbopuffer particularly excels, and how do you think that—or do you think that—changes over time? Because to your point on Readwise, some of it is we cannot build a feature because it's too expensive right now. Cursor saved a lot of money; they can do more. That's going to be true for a long tail of customers. So maybe your belief is just the thought market grows. How do you think about dividing the market and where turbopuffer slots? This is the short version of that question.

Simon Eskildsen [21:11]:
Yeah, I think that I didn't really think about any of those things at the time is the honest answer. I think that I can talk now about ideal customer profiles; I can talk about—I can use all these terms that I didn't even know at the time. But at the time, it just came from a strong instinct that we could make this 100 times cheaper. It is offensive to me that all of these existing incumbents are in memory because it feels like there's a lot of workloads out there, like the one I saw at Readwise, that really just cannot afford this and are okay with a different set of trade-offs than the incumbents at the time. I happen to think they were a really good set of trade-offs. I didn't know what the customers were going to look like. I was only thinking about Readwise at the time and thinking that there must be others out there and that it must be a common problem. Now I can talk in much more sophisticated terms. I was just sitting down a bit earlier today thinking about what kind of questions you might ask today, and one of the things that I reflected a bit on is just that the language—and I mean, you've also gotten to know me over the past few years—the language that you use to describe these things sounds like, "Okay, yeah, sat down, did the napkin math, built the database, got customers with the ICP," and it just looks like this master plan being executed. But it never looks like that from the point of view of the founder, and I think that any founder telling you that would be disingenuous. At the time, it just came from being immersed and having spent so much time in the napkin math soup and knowing exactly what things cost in the cloud down to the cent on almost every SKU, and then just thinking, "Hey, if we put these things together, we could build something very different, very different economics." There's got to be a bit of a Jevons paradox, you know, gas gets cheaper, people drive more thing at play here, and it turns out that that was right.

Barr Yaron [23:39]:
So Simon, I'll ask you something. I'll ask you it slightly differently and pointed, although I do want to get into some of the technical trade-offs, which is at what point in time did you gain conviction? Because you're like, "I'm doing this. I see that Readwise has this problem. I suspect this is going to be a problem for other people. It's a perfect use case for Cursor." But, you know, there have been many vector databases today in the past, and then there's also a subset of folks who are using things like pgvector on top of their databases. So at which point in time did you gain conviction that there is a large market here and this is what you want to do for many years to come?

Simon Eskildsen [23:39]:
I think that in the beginning, we were very set on scaling for Cursor and giving them an amazing experience. We picked up some other customers that believed in us very early. These customers that are your first signups and that join the Slack channels, it's a very special relationship even now, years later, that you have with them. At some point, one of our peers launched an architecture that looked very similar. At that time, we were just continuing to see people who really liked the product and they liked the performance. I think in early 2024 is when we started seeing just getting very serious conviction on the kinds of workloads. I would say that there was a day where one of our early customers, we showed them a quote. Previous to that, they were using another vector database with a different set of trade-offs that turned out to not be ideal for them, so they were paying for performance they didn't need. When I showed them the quote, they asked me to show them a quote for 10x the data volume because now they realized that this would unlock some product that they've wanted to build, but that the per-user economics previously were just holding them back. This was in around May of 2024, and that's when my conviction really dialed up. Now I think now that we're seeing how much the modern agents and models are spending just querying datasets has increased my conviction to just an inordinate level.

Barr Yaron [25:35]:
I mean, that's awesome. Let's talk a little bit about what you've learned with these customers. So we talked about what it took to make that first simplest version of turbopuffer. What are the core optimizations and changes that you've made since then? And then to the last thing you made, we'll get there later. I'm curious how agents play into all of this and what you think ideal storage for agents looks like. So we'll do the optimization so far and then the optimizations you see in the future.

Simon Eskildsen [26:00]:
Yeah, so turbopuffer V1, the team internally makes a lot of fun of it. They call it founder code. I call it the reason you have a job. The other day, someone was tagging a bot, a Cursor agent inside of our Slack, saying, "Hey, can you remove all the code done by Simon?" So there's a running joke to get rid of every single vestige of the first version. But it got us very far. I did not expect it to get us that far, but it was rebuilding the entire index periodically. It was very simple. We moved from a very simple binary encoding to zero copy. We moved away from Nginx very quickly; we moved away from running everything on one TMUX very quickly, and just maturing on that first engine. It became very clear in the beginning of 2024 that this initial engine was going to sort of reach end of life by mid that year, given the growth that we were seeing. Again, we knew we had a lot of room for optimization there, but at the time, another engineer joined us—a phenomenal engineer—and more or less, he was focused on just building a new engine based on the workloads that we've seen. Very write-heavy needed to do incremental maintenance of the clustered index. It took a lot of time to get that right, building on top of a proper LSM for object storage rather than the very hacky storage engine that I had written. So we sort of exhausted the potential of V1 by mid-2024, and then we completely replaced it with V2 in the fall of 2024. The V2 engine is like a textbook, very simple, sort of CS 101—at least initially was not so much anymore—implementation of an LSM on object storage with the trade-off that that comes with. Then it was using an incremental clustered algorithm called SPFresh to maintain these clusters without having to rebuild the world periodically. We switched over completely to that. There's a lot more optimizations we could go into now on the V3 engine, but we expect the V2 engine to be the foundation of what we iterate on for a very, very long time.

Barr Yaron [28:43]:
You know, you mentioned the indexing as sort of the big decision for V1. Between V1 and V2, what were the most challenging decisions? So, for example, you mentioned that one of your engineers focused a lot on writes, and there was a trade-off in terms of the number of writes. So, you know, what were the core decisions between V1 and V2 that you all spent a lot of time thinking about?

Simon Eskildsen [29:06]:
The biggest pain point really was to get to something that would maintain the clusters incrementally, right? Like taking, you know, suddenly, you know, you have one cluster, right? And then someone starts adding a lot of dresses and whatever into it, and you have to split the cluster to make the search efficient. This, when you're doing it at tens of thousands of writes per second over tens of millions of vectors, is a very difficult problem, and it's very important because otherwise index accuracy will degrade over time. It's not like a B-tree where it's very simple to prove that it just remains stable over time as you add and remove elements. It's very challenging to do. A paper came out around that time of incrementally maintaining these clusters, and I'd experimented with some of that during the first summer because it felt that there was an intuition that at some point you could split a cluster, and maybe if you took enough away from the cluster, you could merge it and things like that, but I could not get it right. There are a couple of good ideas in that paper.

Barr Yaron [30:06]:
Never talked about it, whenever that was.

Simon Eskildsen [30:10]:
Yeah, and I think we weren't even convinced that this paper was a good idea. Boyan, who implemented it, was certainly not convinced that it was even remotely possible to do this at a high recall. But we started working on it; we started experimenting; we saw good results. But we have had to do a lot of work to make this work properly at scale. I think that if datasets are not changing very much, you can get away with just rebuilding a world, and a lot of businesses will be able to do that. But if you want to maintain indexes with tens of millions, hundreds of millions of indexes, you really need to have something where you can maintain these without having to re-cluster the entire dataset, which is extremely expensive. So that was really the biggest development in the V2 engine was to move to this and then also redesigning the storage engine. The first storage engine was very simple in terms of like, "I'm going to put this file here, and it has this data," whereas the V2 storage engine is a key-value store, right? It's like an LSM where we think about compaction, and we think about SSTables and all of these different primitives rather than just a struct that is put into a file and zero copied out of that file. It is a much more structured thing to iterate on as turbopuffer supports more and more queries, also not just vector queries but also full-text queries and some of the aggregations we can do now and these kinds of things. So it was really a maturing of the database where V1 was get us to market, get some customers, and learn from the workloads because I think that it was clear to me that the workload that these AI companies were going to have was not going to be completely clear to us, and there was going to be a different set of trade-offs. We really learned on V1 what those trade-offs were that could go into the other engine, like very write-heavy, and we learned a lot about how long things should stay in cache for and so on and so forth.

Barr Yaron [32:01]:
Maybe to be explicit about that, I mean, write-heavy is one of the things, but if you had to summarize how AI-native workloads pressure databases in fundamentally different ways, how would you sort of in two sentences describe that?

Simon Eskildsen [32:13]:
Yeah, it's probably like a hundred to one, right? Read ratio might be something to aim for. For some, it's different, but that's something that we see. The other thing is compaction is fundamentally different on object storage than it is on a disk. There's no literature about that.

Barr Yaron [32:31]:
How frequently are you seeing writes and the number of writes, and how are you dealing with that?

Simon Eskildsen [32:36]:
I mean, the biggest thing about the number of writes is that turbopuffer is designed around doing everything to object storage and not having any metadata layer. I think writes, when you have to coordinate across multiple nodes, are very challenging to do, but we just commit files to object storage, and object storage is extremely scalable, so that's one way that we think a lot about writes. The other one was the incremental updating of the indexes, which is obviously extremely important if you're doing a lot of writes. Those are probably some of the things. I mean, when you think about compaction, you also want to know how many writes are coming in, how often do you have to compact the database, how do you compact it, how do you lay out the LSM. These things—the read-write ratio dictates all of those things. Not to say that turbopuffer is not phenomenal in the reads as well, but we do see a lot of writes.

Barr Yaron [33:32]:
The answer may just be it made no sense with the architecture, but was there ever a consideration to have a metadata layer?

Simon Eskildsen [33:38]:
There was. My co-founder and I spent a lot of time talking about whether we should have a metadata layer and felt like everything was leading us to that point. Richie, who you also know at WarpStream, was like early on, we kind of became friends because we were both building on the same architecture, and for them, it was a very clear decision, right? The Kafka protocol sort of required an enormous amount of coordination with the metadata layer. But we had the luxury of there not being a real standard for these search workloads, so we could design the protocol around not having a metadata layer if we could get around to it. But we really thought we would have to. We also thought that you would want to replicate just to make the writes faster and have lower latency. But it turned out that with compare and swap on object storage, we were able to do all the metadata on object storage itself, and frankly, it probably came a little bit more out of necessity in the beginning than—again, back to the—it looks maybe very clever in retrospect, but really at the time, it's like, "Well, our customers are scaling really fast; we don't really have time to look at a metadata layer." And it was just literally the metadata files were just JSON files on object storage that we were just doing CAS on, and it worked better than we expected it to do. When we needed a queue, we also just implemented it on object storage. Well, maybe we'll have to use a better queue at some point, but it kept scaling. I think this is also the bitter lesson of scaling infrastructure is that the simple thing often takes you very far. We kept learning that again and again at Shopify as well, but you keep being surprised too.

Barr Yaron [35:20]:
That makes sense. Look, you talked a lot here about trade-offs, right? Like the trade-off of, you know, we didn't even have time at the beginning for the metadata layer, and now we don't think it makes sense. But, you know, all of these databases, they do make some trade-off between latency and accuracy. Can you just tell me a little bit about how you measure accuracy? I know that turbopuffer does the automatic sampling of, I think, like 1% of queries to measure accuracy of index recall, but just a little bit more color on how you all think about that.

Simon Eskildsen [35:54]:
Yeah. So on the accuracy for recall, that's really important to us. I think that we didn't feel comfortable with just the academic benchmarks at the time. The academic benchmarks use dimensionalities that we weren't comfortable with, like in terms of the fact that we weren't seeing our customers using these datasets. A lot of the academic datasets are maybe 128 or 256 dimensions. Most of the production datasets we see have much higher dimensionality than that. The other thing about those datasets is that they don't have filtering. So if you filter by products that are on sale in Canada, it sort of cuts maybe half of the clusters in half. Then how many vectors are you supposed to look at to get good recall? I think it was just like this is where I think that we really feel like production—nothing tells the truth like production—and the way to tell the truth from production was to sample a small percentage of the queries, run it against an exhaustive search on the indexing nodes, and then just submit it to Datadog. In Datadog, we have a view of every organization and their recall, their p10 recall, and all of that. We spend a lot of time looking at that, and we look at query plans and everything. At some point, we'll for sure expose this to users, but it was the only way we felt comfortable that every query plan was going to have high recall in this production. So this has been a very important consideration in everything we've done for turbopuffer because we don't want our users to have to guess whether their search results are not what they want them to be because of inaccuracies from the search engine.

Barr Yaron [37:42]:
I mean, I love that. Look, you've lived in dealing with the mess of when things don't work in production. And so having a lot of empathy for that and making sure that things work as you expect. I mean, you know, we talked about—we alluded to talking about the optimizations and the changes to turbopuffer in the future. So, you know, it sounds like you've learned a lot from being very, very hands-on with this initial set of customers. So the first thing I'll ask is, at which point in time did you say, "We're ready for GA"? Kind of, "We're done with this." Like picking—you know, you said maybe you didn't know the words at the beginning, but picking the customers that at least felt right and working really closely with them. So when did the GA—when were you ready for the GA button to be turned on?

Simon Eskildsen [38:23]:
I forgot to answer the first part of your question before. So let's move back to GA, which is the trade-offs. So trade-offs are very important in a database. Again, if you go back to really the fundamental thing with databases and why some databases are good at things that other databases are not as good at. I spent so long being on the buy side of the database, and every single time I went to a database website, I'm just like, "Where are the limits? Where are the trade-offs? And where's the architecture doc?" Those are the things I care about. I just need to load this mental model into my head ASAP and know what it's not good at because otherwise, I can't tell what it's good at. The fundamental trade-off for turbopuffer is that the writes are slow, right? We have to commit them directly to object storage. It takes hundreds of milliseconds. So you're not going to do an OLTP workload on something like that; it just doesn't make sense. You can't do checkouts for e-commerce on something like that unless you get it all done in a single transaction. Then once in a while, you'll hit a node that doesn't have the data on disk, and you'll hit a cold query, and it will be a couple hundred milliseconds instead of tens of milliseconds. For search, that's a perfectly acceptable trade-off. For high-frequency trading, maybe not so much. We happen to think that this set of trade-offs are pretty phenomenal for a lot of workloads, especially search.

Barr Yaron [39:39]:
I mean, I really agree, and I think your website shows that fundamentally well. Like you can slide and understand how much you're paying; you can go and very clearly see what turbopuffer is good for, what turbopuffer is not good for. So that's very, very kind of customer-centric and clear, which I think resonates with the types of people you're selling into.

Simon Eskildsen [39:57]:
Yeah, I mean, I just wanted the website to be the website that I would have wanted. So on GA, basically, we just shifted when it was ready. All of our engineers spend a lot more time and gravitate more towards writing Rust than React. So I wouldn't say that it was like, "Oh, we were at this point in this curve and blah, blah, blah." We probably could have GA'd in January if we happened to GA what, like two or three months ago or something like that, maybe a little bit less. It was really just when it was ready. We hired someone who needed to maintain the front end because it had been me, but I got busy with a lot of other stuff.

Barr Yaron [40:37]:
Well, and they're deleting all of Simon's code.

Simon Eskildsen [40:40]:
Yeah, I mean, this is the thing now is that all the Rust engineers that are complaining about my code, they have to go rewrite all my JavaScript code, and they don't want to do that. So we're hiring some other people to go do that. GA was really about that. I don't think it was a maturity thing. I mean, there's always things that you want to improve in your product. But I think we feel really good about the offering that we have. Some of the things we wanted to do is that we wanted to scale a little bit of the support and go-to-market staff that we had to make sure that all of our customers are really well supported if they run into anything that we can help them with. But in general, there is no big brain game like move around when to GA or not other than this is when we feel that it's ready, and you know, it feels ready. That was at the beginning of this year where everyone felt very comfortable with going GA. It was sort of a question we asked each other monthly, and it's like, "Ah, we have a bit much going on right now." Around the beginning of the year, we were like, "Yeah, I mean, it doesn't matter; anytime." So I would say it was pretty vibes-based.

Barr Yaron [41:48]:
You know, we talked about first turbopuffer V1, turbopuffer V2. We could probably spend another three hours going into each of the optimizations, but if we just roll the tape forward and think about, you know, what does search look like in five years, and what are some of the demands as folks move to more agentic workflows, what do you see as the database needs of the future?

Simon Eskildsen [42:13]:
Yeah, I think to pattern match a little bit across what we're seeing from our customers, I think that what we're seeing is that the wave of AI companies that are doing well are trying to find more interesting ways to connect more data into their products to make the LLMs more useful. I think where we feel right now that LLMs are better than any person on the time frame that they could do is doing research on something. It is just phenomenal at this and generating reports over enormous amounts of data. What we see our customers want is just to search more data, and I think we can help them with that, and I think search will do that too. I think that a lot of search is going to be the LLM doing the search more so than the human. There's probably going to be a 100 to 1 ratio, something like that, I don't know, of agents and humans doing the search. But it's very clear that even if the context window goes to infinity, it's just going to be a lot cheaper for them to converse with the data in some way. It's never going to make sense to put a billion rows into a context window and ask it to do analytics on a dataset. It's never going to make sense to do ACLs in a context window. Recall is always going to suffer a little bit. So there will be some combination of this. I have no idea where search is in five years, but I think our customers have a really good sense of what they want us to ship in the next three to six months. We're very focused on listening to our customers and pattern matching across them and working very directly with them to figure out whether they really need something so we can make sure that we maintain simplicity in our product. So the long story—that's maybe the long answer to your question. The short story is we don't know, but we listen to our customers. They don't know either, but they know what they need right now, and if we continue to do that long enough in a principled way, I think that will serve them really well.

Barr Yaron [44:21]:
I love that answer. It's honest. Yeah, and it's working. One of the things that has come up throughout this entire conversation is this customer centricity. How do you hire the right team that cares about this and that is able to, I guess, engineer at the level of nitpicking your JavaScript code?

Simon Eskildsen [44:38]:
I mean, the short answer, I think that we just invite our customers into Slack channels, and our engineers too. I think our engineers take a lot of pride in the stuff that they work on, and then loosely will pattern match on, "Oh yeah, this seems related to something that I'm working on. I'm going to dig into this." I think we have a lot of trust in the customers that we work with that if they report an issue, that there's almost always something there. That mutual trust, I think, just shows, and it means that our engineers want to engage directly with our customers. So it's been a matter—it—I don't think I have any secret answer to this other than this has felt very natural, the way to build our business. It felt very natural that we needed to work very closely with our customers, and our customers really liked it. They've said things like, "We feel like you're a high-performing team inside of our company."

Barr Yaron [45:32]:
But are you screening for something at the door when you're interviewing candidates, when you're bringing people to come work at turbopuffer? Like, how do you balance that kind of cracked technical engineer with care about the customer? Are you looking for it explicitly?

Simon Eskildsen [45:50]:
I don't think I've met a p99 engineer who doesn't care about the customer experience. So it's not something that we screen explicitly for. I think we make it very clear how we think about our business. We don't have an interview session that's like, "Hey, you're on a customer call, and they're running into this bug. What are you going to do?" I maybe we should; I don't know, but I don't think it would be very high signal.

Barr Yaron [46:18]:
You don't know what five years out looks like; neither do your customers, and you're doing this. They know what they need for the next six months; you're adjusting and operating very quickly on that. Is there something that AI teams are doing today that you're confident is going to be considered bad storage hygiene in a few years, even if you don't know exactly what it will look like?

Simon Eskildsen [46:38]:
No, I think that some customers should be doing more bad storage hygiene than they do. We work with one customer, and they wanted to ingest a lot of data from third parties like Google Drive and others. I asked them, "How are you going to do ACLs? It seems very complicated to implement the Google ACLs." They're like, "Oh, like with your economics, we're just not going to deal with it. We're just going to have a complete copy of the Google Drive per user with the ACLs they have access to." Because that allows them to go to market quicker, and they'll solve the ACL thing later as an optimization. I think that's exactly the right way to think about it. I think that the pace that companies are moving at right now is faster than anything I've seen before. I mean, it's only reminiscent of the fastest pace that I saw inside of Shopify as I was going through the hyper growth into the 2010s, but it feels like so many companies are moving at that pace. I love working at this pace. To work at that pace, you have to make some of those trade-offs. I don't think there's anything that our customers are doing that's like bad storage hygiene. I think we see our customers run very fast with turbopuffer by abusing these economics to go to market quicker with product.

Barr Yaron [47:59]:
Yep. Yeah. And yeah, in many ways, that is the Jevons paradox that you want for now. So I will ask you one more question. And thank you so much, Simon. I'm curious how you think you've changed as a leader, as a person since you started turbopuffer.

Simon Eskildsen [48:13]:
This is a very good question. I think what it comes down to is that we have some very simple principles that we operate on as a company. Some of these we believe very strongly in, and we try to put them into everything that we do. I think that for a while, I didn't have as much conviction that these principles would work. Like, I didn't know whether being this customer-centric would work, but we've seen it work. So we're continuing to do it and directly working with the customers in a way that may be unusual. I think a lot of my growth as a leader has been to just trust that these simple principles, when applied for long enough, will do the job, and you don't have to sit down and come up with some strategy. It feels like a banned word at turbopuffer, right? That is not what we do. We have simple principles around how we do it, and we align on those principles, and you do that for long enough, and I think you can build a really, really great company. I didn't have the confidence for that two years ago.

Barr Yaron [49:15]:
I think a lot of what we had talked about is thematic with that, right? Like that first initial customer and following your intuition with Cursor, the Readwise aha moment without knowing exactly how big the market is, and things have compounded on top of that. So I know we're at time, but thank you so much, Simon, for coming on and taking the time. Always fun catching up.

Simon Eskildsen [49:33]:
Thank you for having me, Barr.

How to build 10x cheaper with object storage

Transcript

Guides

API Docs