Faster vector search

November 13, 2025•The Database School Podcast

Transcript

Aaron Francis [0:00]:
Hey y'all, welcome back to the show. This is Database School. I'm your host Aaron Francis and today I have Simon Eskildsen on. He is the co-founder and CEO of a very cool company called turbopuffer. And turbopuffer is a serverless vector search, full-text search, interestingly built on top of object storage. It's built on top of Amazon S3 and, as we get into in the interview, also on Google Cloud and Azure Blob Storage. The interesting thing about this is it makes it obviously extremely scalable, but also way, way, way cheaper. He talks about some of the numbers in this episode, the number of vectors they're doing. It is insane. And weirdly, their first customer was a little company you might have heard of called Cursor. I don't know how that's their first customer, but this is a great show. He's very, very technical. I think you're very much going to enjoy this one. Please subscribe to the YouTube channel or visit the website at databaseschool.com and let me know what you think. Enough from me, let's get to the show with Simon. I am here today with the co-founder and CEO of turbopuffer, his name is Simon Eskildsen. We're going to talk about a lot of stuff. We're going to talk about scaling. We're going to talk about founding companies. But Simon, do you want to introduce yourself a little bit further?

Simon Eskildsen [1:29]:
For sure. Yeah. I mean, you did a good job. Today, I spend most of my time working on a search engine called turbopuffer and building the company around it. I came of age spending almost, you know, just shy of a decade working on infrastructure at Shopify, pretty much anything to make sure that the site stayed up and stayed scaling for all of the big customers and the tremendous growth of that company through the 2010s. And then after that, I helped some friends with infrastructure challenges at their companies, kind of in little three-month stints. I called it Angel Engineering. I took equity for improving Postgres query plans and optimizing AutoVacuum for a few years. And that's where I discovered the problem that eventually led to turbopuffer: that the incumbent search engines were not a delight, which was exactly my experience at Shopify. Now I've run turbopuffer for about two and a half years alongside a fantastic team here, and we're very proud to power many of the companies at the frontier of AI.

Aaron Francis [2:36]:
Wow, that was tight. That was really well done. I want to talk about what felt like the glory days for Rails engineers—the era you're describing: early 2010s, Heroku, GitHub, Shopify, everybody on Rails, and you all were the cool kids. I was so jealous of that whole scene.

Simon Eskildsen [3:00]:
What do you mean—like, whether I was living that Rails story, or something else?

Aaron Francis [3:04]:
Right, sorry—I'll set my side first so you know where I'm coming from. I wasn't in the Rails or San Francisco crowd; I was in PHP. I love PHP, but I wasn't one of the cool kids. I graduated in 2011 with a master's in accounting, spent a year at Ernst & Young as a tax accountant, and that stretch is where I felt totally lost—the work wasn't for me. I'm still glad Ernst & Young gave me that year; I just mean I was off in a different world from the Rails scene. I missed the wave of Rails companies that became billion-dollar businesses. We didn't really have that parallel story in PHP land.

So I'm curious about your path: you were at Shopify for, what, eight years, scaling from something like a thousand requests a second to, call it, a million—at least the right order of magnitude. What was that like? What did you come in as—just a Rails developer? What did you end up as? What was the journey through that hypergrowth?

Simon Eskildsen [4:03]:
That's exactly right. I started, I mean, I... I started in PHP too. It's so good. The feedback loop is so good. I still miss it, you know, creating websites for the World of Warcraft guild. And then at some point, you're writing the same thing a million times. You build your own framework—that's how it goes. I probably did that too: how to build your own PHP framework from scratch. And then this dude in my World of Warcraft guild was like, "Have you looked at Ruby on Rails?" And I'm like, you go to the website back then, and I was like, "Is this like a toy or is like what is this?" and was very intrigued. And then got into Rails and I spent a lot of time with Rails in high school, worked for a company in high school, a startup in high school doing Rails. And then Shopify found me because I wrote this article about dropping my iPhone and it just dying. You know, they're like the 2010, 2009 iPhone. You dropped it once. Actually, this is funny. I think about it. I dropped my iPhone this morning and it didn't get destroyed. But back in 2009, if you dropped an iPhone once, it was dead, right? Screen smashed, it was game over. So you guarded that thing like an absolute... like, you know...

Aaron Francis [5:23]:
OtterBox cases were a thing. I remember that.

Simon Eskildsen [5:27]:
...and I didn't have one of those, and I just dropped it, and it got destroyed. And then I wrote an article about how I suddenly had my sense of direction back because I wasn't looking at a map all the time. It was like I had an attention span and just sat and looked into the ether, and it was actually pleasant. This was before we had realized all these pernicious effects of smartphones. It was just not a thing in 2010. And this article blew up, or sort of 2013, this article blew up, and Shopify reached out. I don't think they realized that I was in high school until I told them that the start date had to be after the semester. I was like, "Aren't you like..." Anyway, so I started there in 2013 as a Rails engineer, and I sat down to lunch one day, and I was so intrigued by what the infrastructure people were talking about. It was this like load balancers and, you know, squid here and varnish there and like Nginx. And, you know, I don't know, they probably used them all together, even though that was crazy. And I didn't understand like what's a proxy, what's a reverse proxy. I don't think I still know what a reverse proxy is. I don't know if anyone knows why it's called that. But this was very intriguing to me, and I just started picking up stuff that was in their issues and just started doing stuff with that team. And then somehow, like, found myself there working on Docker. And, you know, 2013 was a crazy time to get into Docker and got into that, got Shopify running in containers, and that was sort of my foray into infrastructure from Rails.

Aaron Francis [7:07]:
Yeah, it's funny that you talk about like in 2013 you dropped your phone and the world opened up before you. It's like how quaint. Like, dude, can you imagine like in 2013 looking forward to 2025 and just like realizing how much we're all going to be on our phones? And you had that realization a decade ago? Like, man. And another thing I want to highlight is something I talk about quite often is like, you really, one, not you specifically, one really needs to be writing and putting their work out there. And even if that's, you know, tweeting or making videos or posting on LinkedIn, God forbid. And like your story is very similar to how I got my first PHP job, which was I was doing something and I wrote about it on my blog, and some company reached out and was like, "Hey, we're doing that too. Can you come work for us?" And we'll pay you what to me was like a huge amount of money. And I was like, "The internet is amazing." And so it's nice to hear stories where you put something out there and good things come your direction. That's really encouraging for me.

Simon Eskildsen [8:10]:
I feel like another way to describe my teenage years was as a series of blog posts that made it to Hacker News, even if it was for a brief period of time. That's how I got my first job. It was how I went to Shopify. It was how I got my second job. Well, actually, the job I had in high school was because allegedly one of the co-founders was looking. I grew up in a small town in Denmark, and they were looking for... I should do this. I need to take a note of that. They were looking for people who were committing on GitHub. You know, again, just committing to GitHub in 2010, super strong signal. And committing to GitHub in 2010, 2009, between the hours of 2 a.m. and 4 a.m. in this small town. And they somehow found me through that. That was their talent sourcing process. I need to do that. You go to, you find, you go to like some tiny town in Nebraska and you just zoom in and you just geofilter into that, and you just find Alex, and Alex is just up, and it's 3 a.m., and she's just slinging C code because she still hasn't heard of Rust because no one in the town has even heard of C.

Aaron Francis [9:23]:
That is a great story. Yeah, I feel like back then, GitHub, you actually followed people and it was a social network, and then there were forums where people would actually hang out and talk, and you could really find some undiscovered talent in those places. And I don't know if that's still the case, but hey, it's worth looking into. So when you started as a Rails engineer, talk to me about some of the scale that you saw maybe in the second half of your career as you transitioned over to the infrastructure side or started getting really good at Postgres. What kind of scale growth did you see at Shopify while you were there, and what were some of those challenges?

Simon Eskildsen [10:03]:
Yeah, when I started working on infrastructure at Shopify, it was a couple hundred requests per second. And the last BFCM before I left was around a million requests per second. And on a Ruby on Rails cluster, this is quite sizable scale. And so I worked, when I worked on doing that, my time there was just making sure that everything in the core app and all the core database stuff just worked. And we were a team of comrades on the last resort pager. There were about six to eight of us, and we were just dealing with whatever came up. Some of us, I was often working on the more longer-term database projects, making sure we could run in multiple regions, make sure we could fail over a region with very little notice. All these kinds of things of just sharding everything as hard as we could on the Shopify ID. That's what I worked on for a very long time: everything caching, everything database, because that's the hardest part to scale when you're scaling something as quickly as we scaled Shopify. It was a journey of a lifetime. I also worked there. I worked on search. I worked on... me and Justine, my co-founder, we rewrote the entire Shopify storefront and then built a wonderful team to do that. And for performance and all the things that we learned about running Shopify, we got a rewrite to serve almost all traffic in 18 months for all of Shopify, even though the app is almost 20 years old. Just so many fantastic things we built. Nginx and Nginx Lua were just the secret weapons to run all of Shopify to shift traffic so that when Kylie Jenner was selling lipsticks, that, you know, Aaron, I don't know if you have a Shopify store, but if you have one, Kylie should not be able to take it down, right? And prioritizing traffic, multi-tenancy engineering. I would never have guessed how much this would translate into starting a database company, which I always felt incredibly insecure about. Well, because I haven't written a database. I've used databases for a very, very long time, but most database founders have built databases their whole career. But I've mostly just used them and bought them and looked at a lot of websites for them. I think our website is sort of the antithesis because a lot of the times when I went to a database website, when I was at Shopify, I felt like half the time I couldn't figure out if they were selling some new cool bespoke fashion product or if they're selling a database. And I just really wanted to know, like, what are the trade-offs you've made? What are the guarantees that you provide? What does it cost? How is this to run? What's the architecture like slotted into my mental model? And those are the questions that I hope that when people go to our website, they just feel that are immediately answered.

Aaron Francis [12:44]:
Yeah. Yeah. I want to get to turbopuffer for what it is. But just right before we do that, how fondly do you look back on your time at Shopify? Like when you look back on that, what are the feelings that you have?

Simon Eskildsen [12:58]:
I think the feelings are... There's a video of me that my dad has from when I was a kid where I just got off a roller coaster at Disneyland Paris and say, I say something along the lines of, "That was really fun. I never want to do that again." I think that's the feeling that you walk out of a hypergrowth generational company run if you have really just given it everything you had. That's how I felt. And when I stepped away from Shopify, there was some recovery to be done. And I don't think that I ever thought that I would step back into the ring and the fire again. But once you've experienced it once, you end up chasing it for the rest of your life, even if it's with some periods and different seasons of life in between.

Aaron Francis [13:56]:
Yeah, no, I totally resonate with that. I mean, that's how I felt after our first set of twins was born, and then we had a second set. So I get it. I get it. That was, yeah...

Simon Eskildsen [14:07]:
Wait—you have two sets of twins. Four kids, and at one point you had two-and-a-half-year-olds and two newborns at the same time. I don't know how you're standing here today. I don't know how you do anything. That is more remarkable than anything any guest has told you on this show. That is epic.

Aaron Francis [14:22]:
Yeah, never want to do it again—but here we are.

Aaron Francis [14:32]:
Yeah, needless to say, I resonate with that feeling deep in your soul of, man, that was awesome. Boy, am I exhausted. So I get that. So you come out of Shopify and you spend these three months at a time, you know, going around and helping friends, you know, dusting off some of your war stories and taking them to other startups. And then how long did that period last where you're doing this consulting? And then we'll get to turbopuffer.

Simon Eskildsen [15:00]:
Yeah, I think first, first I was just like, this is the summer where we just get the deadlift up as much as possible. And it was just like the summer of Simon. And I loved it. I was just writing napkin math posts and paddleboarding like a maniac and working out five times a week, and it was glorious. And then by the fall, I started getting bored with that. And that's when I started just sort of in three-month increments joining my friends' companies and vesting a bit of equity in return and helping them with infrastructure stuff. And the common thread was basically: at Shopify we used MySQL, so if you're using MySQL in the 2010s, you feel incredibly gaslit by the orange site that you made the wrong choice in 2005. It doesn't matter how good your business is; it could be better if you're using Postgres. And so I was very excited to finally get my hands on this thing. And it turns out that it's just a completely different set of problems. And one of the trade-offs is tuning the AutoVacuum. That's what I spent a lot of 2022 doing, was tuning AutoVacuum. And one of the companies I was tuning AutoVacuum for was this company called Readwise. And Readwise, what it is, is that it's a way to read articles and books and whatever later, highlight it, retain it, search in it. And they asked me if I could build a little recommendation system. And I'd heard of these things. I mean, I don't, you know, I'm like a database nerd. I don't really know anything about recommendation systems. So I just started researching, and I found this thing called vector embeddings, right? Where basically you feed an LLM a bunch of content, you chop the head off, and then out comes this like coordinate in a coordinate system. And magically, like coordinates that are adjacent in the coordinate system are also of content that is similar. And that was really cool to me. And so I read a bunch about that. How do you build these models? And so what occurred to me is like, well, these models are actually... these embedding models are trained on the articles because they're trained on the public web, so it must be really good. And so I built a very small recommendation engine over the course of a week that would sort of like take articles that you've read and then, you know, do vector embeddings, find other articles that were similar, and it kind of worked. Like, it was kind of interesting, and it was at least good enough that I got recommendations from one of the team members about articles that were around like pregnancy and so on. And I didn't know that his wife was pregnant, but it sort of leaked through the article recommendations because, you know, it was at least that good. But the problem was that, you know, sort of got something fine, wasn't excellent, but it was promising enough to go in this direction. And I ran the back of the envelope math on bringing this to production on one of the vector databases at the time, and it would have cost 30 to 40 grand a month. And this was a company that was paying three to four grand a month for their Postgres and maybe a couple other thousand per month for all their other infrastructure. So you're almost looking for them to spend an order of magnitude more to power the infrastructure for one feature. So we'd sort of entered this bucket of, "We'll do this later when the costs come down." And that problem haunted me because I looked at the pricing. Well, clearly, they're doing everything in DRAM and they're replicating it. It's like there has to be a better way. And so I maintained this repository. Have you seen the napkin math GitHub repository? Have you ever encountered this? It's essentially just a really shitty Rust script that I wrote that figures out memory bandwidth, disk bandwidth, like all these different sort of base numbers. And it's very useful because you can do things like, "Oh, this algorithm is taking four seconds, but actually we did, you know, it's traversing 20 gigabytes, and you know, you can do that in a second on a modern machine." So what's this 4x gap? And you figure it out. This, by the way, is way better method than profiling. Profiling just sort of like that is for iterative improvement. Napkin math is very good for finding order of magnitude improvements. And I just started doing the napkin math on building a search engine where you put everything on S3, everything, like nothing else, no Raft or Paxos or consensus bullshit, just everything on object storage. And it seemed like you could actually do it. The fundamental trade-off was that the write latency would be higher, and then once in a while when you queried something before you loaded it into cache, it would be a little bit slow, but you might be able to get it off. And so I just holed up in, you know, at my cabin in rural Quebec for the summer of 2023 and just forced through trying to find a way to make this architecture work because it felt very clear to me at the time that if I was not going to do it, then someone else was going to do it. And there's just these like S3 had only gotten the right primitives to be able to do this very recently. NVMe SSDs were recent-ish, right? And it felt like the perfect moment in the perfect workload to do this for. I couldn't articulate it very well at the time. It's like there's something here, and it was like trying to explain to my wife, and she's like...

Justine Li [20:21]:
I love you, Simon. I have no idea what you're talking about.

Simon Eskildsen [20:26]:
Yeah. And I released it in October, and it was like the fourth rewrite, and I was going manic, and there were so many things that didn't work, but it just needed to get out there. At that point, it was just like, "Fuck it, here it is." Can I swear? I feel like I've sworn a lot. Is that cool?

Aaron Francis [20:45]:
Go for it.

Simon Eskildsen [20:46]:
All right.

Aaron Francis [20:46]:
Everyone's an adult here. I think we only have 30 and older probably listen to this show, so it's fine.

Simon Eskildsen [20:54]:
Dropped it on X, and it got some good traction, and people were encouraging.

Aaron Francis [20:59]:
As an open source...

Simon Eskildsen [20:59]:
That's sort of... Sorry, no, it was not open source. I didn't have time to open source it. Yeah, it was a product. It was a SaaS product. Like, give me your vectors, I'll give you the results. It was just that. And it got really good traction. That encouraged me to keep going. And then a small company in San Francisco reached out, and they were having trouble with, "We're paying way too much per user to store all these vectors and code bases." And I'd never heard of this company before, but for whatever reason, something compelled me to go visit them. Probably it was because they were so busy that they kept not showing up to our scheduled calls, and I get it. And then so I went to San Francisco and had a great chat with them, and they decided to adopt this little database called turbopuffer, this little innocent thing. And of course, that company was Cursor. And yeah, just a very special relationship. And anytime I got the opportunity to, I would pass on as much knowledge as I had about tuning the Postgres AutoVacuum or anything that I'd learned from Shopify to them. They don't call me anymore because they're probably better at it than I ever was now. But it was a really fun relationship between two very early companies. Cursor was a lot less early than I was at the time, but they really took a chance on turbopuffer, and they really took a chance on me and Justine, my co-founder. And as all of this was transpiring, I was just thinking, who would be the best person that I know that I could do this with? And Justine just immediately came to mind. She was the one that I'd worked on so many incredible things with at Shopify, everything I'm proud of there, and even projects that neither of us are proud of, we've worked on together. And it was just she was just the first that came to mind, and I've never seen her do the things that she's now able to do. It's incredible.

Aaron Francis [22:51]:
And so this is an amazing story. And so you all, so you and Justine, is it the two of you? Y'all are the co-founders?

Simon Eskildsen [22:59]:
Yeah.

Aaron Francis [22:59]:
Okay, so you and Justine founded this thing, launched it a couple years ago, and somehow either the first or the first big or one of the first customers was Cursor, which is insane, by the way. Just insane that that was one of the first big ones. So backing up just a little bit, you're seeing... you've seen in your experience people struggling with search, recommendation, relevancy, that sort of stuff, and you're seeing these primitives come online with S3 and thinking these two things can meet, and it seems like there's something that should be done here. And you go off into the cabin and write it up, and it works. And so I want you to give me the elevator pitch for turbopuffer, and then I want to talk about what you discovered about S3 and kind of how you architected this whole thing such that you can make this work. So what is turbopuffer?

Simon Eskildsen [23:54]:
turbopuffer is an object storage... I mean, it depends a bit on who we're talking to in the elevator here, right? So we'll hit it from a few angles here. turbopuffer is an object storage-first database. It is a search engine that allows you to connect enormous amounts of data to AI. Examples are Cursor, right? And Notion, Linear, and many, many others. And the reason they choose turbopuffer is because we can index and connect more data to AI than they were able to do before. So Notion used to have a per-user AI cost, and when they moved to turbopuffer, they could take that away, and they could index more data than they had before. Cursor used to index in the thousands of files before turbopuffer, and now they index as many as they can find. And the list goes on, and we help our customers realize the most ambitious versions of themselves, their products.

Aaron Francis [24:51]:
Okay, I love it. So a couple of things stood out to me in that succinct pitch. One is object storage, obviously. That is unique. And the other is you said both database and search engine, which don't always go together, right? And so talk to me about, like, fundamentally when you were starting this, what was the thing that allowed object storage to work here where maybe three, four, or five years before, object storage was like, "No, that's never going to work." What was it that you saw about object storage that changed your mind?

Simon Eskildsen [25:25]:
I'll explain this in a way that I like to think about it now in that if you want to create a big database company, you need two things. The first thing that you need is that you need a new workload. If you want to create a really big database business, you need more or less every single company in the world to have a use case for your new database or at least consume one or more products that have a strong use case for your database. That's ingredient number one if you want to build a big database company. The second thing you need to build a big database company is that you need a fundamentally new storage architecture. Because I can promise you that if you have a new workload, all the other database companies are going to look at that new workload and say, "I want that." And there's no good reason why they shouldn't get it and fragment the market unless you have a fundamentally new way to store the data that they can't rewrite everything to do because it would screw up their guarantees and trade-offs. The new storage architecture is to store everything on object storage and then puff it up into NVMe SSDs and DRAM as you query the data, almost like a JIT compiler. That's the new storage architecture. It is only possible now because of three things that have changed. The first thing that's changed is that NVMe SSDs are available in the clouds as of around 2017 is when they launched in AWS clouds, and even then there were very few of them and very few SKUs. NVMe SSDs are about 100 times cheaper than DRAM, but only around five times slower if you use them correctly. Using them correctly means that you have to have a lot of outstanding requests for every round trip, right? So you can't just use them like you use DRAM, like, "I want this, and I want this, and I want this in random order." You have to say, "I want this one-terabyte chunk, and then I want this other one-terabyte chunk," and you have to do all of that in parallel requests, and the storage engines have not been built for that. The second thing that changed with storage architecture is that S3 only became consistent in 2020 at AWS re:Invent. So that would have been December. This is completely overlooked because everyone just assumes this, but that is not that long ago, especially not in database land where it takes a long time to mature. The third thing that happened that's very important is that S3 gained compare-and-swap (conditional writes) for object metadata remarkably recently. And what compare-and-swap allows you to do is to not have a separate consensus layer on top of your database to ensure metadata has not been changed in the interim. That means you don't need another metadata layer. That means that you can build a database where you just run everything on stateless nodes and the only state is in object storage, which means it is more cloud-native than any other database architecture that we can think of. And it's also fundamentally the cheapest way to run a database in the cloud. Those are the three things that changed that allowed the new storage architecture. That's what makes the economics and the underlying architecture is fundamentally different than MySQL, than Snowflake, than Databricks, than MongoDB, than all the existing databases. Those are the two things you need.

Aaron Francis [28:50]:
That is very helpful. Describe to me this. You said you puff it up. Describe to me that part there. So where is it long term? And then what is this process by which maybe you can do request lifecycle, you can do search query, whatever example you want to use, but describe this process that the data goes through to fulfill the user's needs.

Simon Eskildsen [29:13]:
Yeah. So what happens? When you do a read, you will basically go to a load balancer, and the load balancer will say, "Well, what table is this for?" We call this a namespace. The namespace is logically just a directory on S3, right? So it could be one could be called AaronDB, and one could be called SimonDB. It could also be, you know, Notion customer N, or it could be Cursor code base Y, right? These are the prefixes. We call these namespaces. It's a new database construct. So if I am querying AaronDB, then I am going to the load balancer. The load balancer will hash AaronDB and then send me to node two out of 128 or whatever. It will look at the DRAM cache and see if it's there. And if not, it will go to the NVMe SSD cache. If it's not there, then it will go to object storage and get the blocks directly. turbopuffer's database engine is completely optimized for object storage to the point where when we get data from the database, we do a range request directly into the S3 file of exactly the data that we need because we know where it is in the file. A lot of databases that have tiering will basically download an entire file from S3, hydrate it into cache, and so on. But we can operate directly against S3. When you have a cold query that misses DRAM and misses the SSD and goes directly to object storage, the latency is around a second, 500 milliseconds to a second, depending on how hot the S3 prefix is. If it's on disk, it's often less than 100 milliseconds. And if it's in memory, it could be around 10 milliseconds. So it's sort of like these are the order of magnitudes of where the caches are, right? 10 milliseconds, 100 milliseconds, and about a second. In practice, it can often be faster, but that's a useful way to think about it.

Aaron Francis [31:03]:
How do you know when you're going to... let's say you fall all the way through and you got to go to S3. How do you know which range of that file to get? Like, who told you that? And second question, what is in those files? What is the data that you're storing in those files, and how are you storing it? Like, are you breaking it up? Is it like compressed? Like, what does that actually look like in there?

Simon Eskildsen [31:31]:
It might be useful first to talk about turbopuffer V1, like that thing that I hacked up at the cabin in Quebec, and then, you know, brought poor Justine in to help optimize it before we could get to V2. V1 was... You take all the vectors and you cluster them. So should we do a little detour here into what a vector index is?

Aaron Francis [32:00]:
Is now a good time for that detour?

Simon Eskildsen [32:02]:
Yes—please. The simplest way to do vector search is that you have a... well, first maybe I'll explain even what a vector is. So we got a little bit into this, but I was feeling funny before in the way that I explained it, so we'll do it properly now. The way that I usually explain it, and you should ask follow-ups here because we're backing into a long-winded answer to your original question. A vector is a point in a coordinate system that represents a piece of data, and the point is adjacent to other things that are similar, right? I'm standing in front of a table. So the table would be in the coordinate system here, and right next to the table would probably be a chair, but closer to the table might be a dining table, right? So you imagine that you train a model that is very good at taking content that is similar and putting it in the coordinate system.

Aaron Francis [32:56]:
One clarification before you go on. I'm picturing XY—table here, chair there—because I'm human and two dimensions are intuitive. But in reality it's not 2D or 3D; it's hundreds or thousands of dimensions—768, whatever. Is that right?

Simon Eskildsen [33:31]:
That's correct. I used two dimensions because I can't visualize 768. I'd need your twin superpowers for that.

Aaron Francis [33:45]:
Hard pass.

Simon Eskildsen [33:46]:
Fair. Want me to explain it in one dimension instead?

Aaron Francis [33:51]:
Yeah—that would help.

Simon Eskildsen [33:54]:
Exactly. And dear listener: when we say 2D or 3D here, read that as "a huge number of dimensions."

Aaron Francis [34:01]:
Okay—keep going.

Simon Eskildsen [34:03]:
Let's just like, just to really hammer it home. Imagine Spotify. They have all of their songs in a big coordinate system, and you can imagine a rock cluster and a pop cluster. You zoom in and you find little small clusters in there. That's how a simple, the simplest form of a vector index works. You basically cluster it so you find patterns in the coordinates. And then you say, when I'm searching for a song, I'm just looking at the clusters that are similar. You basically take an average of everything that's inside of a cluster, so the average of all pop songs and the average of all rock songs, and you search for your query vector and say, "Well, it's closer to the rock, so I'm only going to look at the rock songs." And suddenly you're searching 50% less data.

Aaron Francis [34:42]:
So your universe got a whole lot smaller.

Simon Eskildsen [34:45]:
And that's how you build a vector index. There's a lot of challenges with that; we could get into that, but that's the simplest way. So what turbopuffer V1 did was that it basically took all the data and then it built a bunch of clusters: rock, pop, hip hop, whatever, and then it created something like centroids.bin on S3 for a namespace—think AaronDB.

Aaron Francis [35:10]:
So who... where was that? Like, let's say that I am... I'm giving you all of my music. Am I locally creating vectors and then I'm throwing vectors over the wall in V1? Am I throwing them over the wall to you and then you cluster them and centroid them? Or like, who's doing what where?

Simon Eskildsen [35:27]:
That's right. You are just like... this is like Stripe for vectors. You're just sending a vector, and then you can search the vectors, and that's really all you can do. There's nothing else really to do. So you send all the vectors, and then turbopuffer takes all the vectors and it clusters them, and then it creates... then it takes the average of every cluster and uses that centroid and puts it in centroid.json. We'll just call it that. It wasn't actually JSON, but just like... so it's simple. So centroid.json is basically just an array of arrays, right? Where every array is sort of like rock and then a centroid, pop centroid, whatever. And then we have another file that's called cluster1.json, cluster2.json, cluster3.json, and the centroids map back to that. Now, the way that you serve the query is that you go to S3, you get AaronDB slash centroids.json, you look for the closest centroids to your query vector. And then you know, okay, well, you know, this is sort of in between like pop and country, whatever, there's probably some artists there, I'm sure plenty. And you look up, that was Cluster 79 and Cluster 108, and you download those two files, you search through those, and then you return the results. Two round trips for S3, P90 to S3 is maybe around 200, 250 milliseconds, query's done in 500 milliseconds.

Aaron Francis [36:50]:
Okay, so I think I see where this is going, but I'm not going to jump ahead. So let me say it back to you to make sure that I have it right. So I, as the user of turbopuffer, create vectors however I want, and then I send them over the wall to you, and I say my namespace, or maybe you define the namespace, doesn't matter, but you put them in a namespace of Aaron's database, and you create from all of my vectors, you kind of create like an overview, like a world map of the clusters, and maybe there's a hundred of them. And in centroids.json, you have a listing of all hundred of those centroids. And then I send a query over that's like, "Hey, I want to find more artists like Taylor Swift." And you go to centroids with Taylor Swift and say, "That's pretty much pop most of the time." And then you go grab the pop music JSON file, pull it back, and then search through the pop music to look for other artists that are like Taylor Swift and then send that home to me. Good?

Simon Eskildsen [37:50]:
That's right—that's right.

Aaron Francis [37:52]:
Perfect.

Simon Eskildsen [37:54]:
So you weren't deep in the weeds on centroids.json and cluster_1.json through cluster_128.json? That was more or less V1—and honestly, "it worked" is the right summary: Cursor and Notion ran on it through the end of 2024. There was a lot less "good" in that implementation than we'd want in hindsight, but it worked. It worked, and it was, you know, Justine did all this crazy stuff because then, you know, I got ripped into B2B sales, you know, everyone's dream, and Justine was deep in the code mines just like making all the shit code that I'd written zero copy and all this craziness to just make it perform. Then we hired some people who actually knew how to build databases. And, I mean, Justine just by her remarkable mind knew how to build databases a lot faster than I did. But we started working on V2 in the spring of 2024, so about six months after launch. And this is where the sort of search engine versus database comes in. V1 is very much a search engine because it can only search; it can't do anything else. V2 was a database that is excellent at being a search engine. A search engine to me is an attribute of a database, but a search engine is not necessarily a database. To me, a database is something that in the limit could do any SQL query in a great way, right? It has a query plan for that. So V2 is like a proper LSM. There's a key space. We implement all of the vector indexing and this clustering is on top of the key space, but underneath turbopuffer is just a KV store, and then we build all this stuff on top. The LSM engine is optimized for object storage and all the trade-offs of that, which are different than doing it in traditional architecture. And that's how we... that's what we would call the database. Now, of course, you can't implement every SQL query in the span of six months on a new KV store, so the focus of turbopuffer is very much to be excellent at search. But over time, turbopuffer is supporting more and more queries, right? We have aggregation, faceting, richer attribute filters—all the database-ish affordances people want on top of search—but our core remains to be an excellent search engine at indexing enormous amounts of unstructured data. So now we can do, of course, full-text search and all these other things.

Aaron Francis [40:22]:
Can I ask why? Full stack search makes sense to me; that's search realm. Can I ask why you did all of those other things? Because that sounds like a lot of work, first of all, and it sounds like search was the original thesis and very, very hard. So why add on this other thing that is also very hard? Who was asking?

Simon Eskildsen [40:44]:
So what you asked for when you have a search engine, basically when you have your data in a database, you start to want to just do anything with that data that's in the database. So, I mean, we get asked for all kinds of questions and very complicated, you know, probably someone's going to ask about CTEs and recursive CTEs, that God forbid at some point, right? So people will ask all those, but of course you have to roadmap it. And the reason that you want aggregations is because you want to do things like, "Well, I kind of need to know how many things are in the database," right?

Aaron Francis [41:17]:
Okay—that's the easy buy-in.

Simon Eskildsen [41:20]:
It's like, okay, that's useful. And then someone is... someone says, "Well, you know, it's really nice on an e-commerce site when you see... when you can toggle the sizes and you can show these, the facets in the sidebar." Well, that's a group by count. And then, you know, you're like, "Well, I kind of want to know how many matches are in each document because I'm searching for these chunks in the document or pages, and I need to... these are aggregations." So it's not like people right now are saying, "Hey, I want to do my revenue reporting on turbopuffer." It's that you have this data in turbopuffer, and you want to do very reasonable things that are still in the area of search. And so those are the kinds of queries that we are prioritizing. But over time, there's going to be lots of them.

Aaron Francis [42:09]:
So is it fair to say that you implemented a lot of this standard database stuff to solve the business search need, even though it is not the technical search need? Does that distinction make sense? Like, as people are doing the business side of search, faceting is a great one. And faceting, for the listeners, when you're like, "I want red shoes," and then like Nike drops down to 50 pairs and Adidas drops down to 25, and you're like, "Red and size 12," and then it's like Nike has two pairs, and you're kind of recalculating all this stuff on the fly. That feels like the business domain of search, but under the hood, that's the technical domain of like, "Well, that's just kind of more traditional databases." Is that fair to say that distinction there?

Simon Eskildsen [42:52]:
Um, I think so. I think it's just that people just need certain... you know, I think of all queries as SQL queries, and there's just a bunch of SQL queries that you expect out of a search engine. It may not look like SQL because if I shipped SQL to turbopuffer right now, if you, Aaron, were using it, you would just get really annoyed at all the things that we don't support. But all...

Aaron Francis [43:15]:
Right—the WITH clauses, CTEs, recursive CTEs, all of it.

Simon Eskildsen [43:17]:
Recursive CTEs are a godsend—you can do wild things with them. I've basically implemented spreadsheet logic with recursive CTEs—while loops in SQL. DuckDB is very good at them.

Aaron Francis [43:27]:
Someone over there really nerded out. I don't think MySQL even had them for the longest time.

Simon Eskildsen [43:34]:
MySQL 8 does, at least—I don't know whether the test suite covers every edge case—but the broader point stands.

Aaron Francis [43:42]:
Yeah.

Simon Eskildsen [43:44]:
There are a lot of things people expect from a search product that end up looking like traditional database queries, even when the API isn't SQL.

Aaron Francis [43:53]:
So that explains to me a little bit of like the difference between V1 and V2 of like scope and kind of like, I guess, thesis or mindset. But can we go back to the naive S3 implementation on the first one where you just had to do two round trips every time? And can we update that knowledge for V2 on...

Simon Eskildsen [44:16]:
Yes.

Aaron Francis [44:16]:
How does that look in—what version are we on? Still V2? V3?

Simon Eskildsen [44:20]:
We're kind of on V3 in some ways, but I'd mostly talk about V1 and V2. We started calling it V3 when a customer needed to search on the order of a hundred billion vectors—

Aaron Francis [44:41]:
—which is hilarious.

Simon Eskildsen [44:41]:
—and pretty hard. So we need adjustments to fit that scale. Anyway, V2 is the more general design—there you're a little bit more civilized about these things, and it's not so tailored to one use case. And so in V2, what you do is that you have a key space, right? It's a... and so you might have as key, you know, cluster... instead of cluster 128 being a JSON file, cluster 128 is a key. And so you have to figure out where is that key, right? And so you download some metadata file, and that metadata file has an idea of what files contain what ranges. I'm really trying to simplify here, right? But you maybe download LSM.json. This is not how it actually works, but it's like pretty close. LSM.json, you get that, and then it says, "Well, in these files, these are the files," and maybe some metadata around what key ranges are in what files. And then you go to file, you know, the file is just like uuid.json. It's like some big file somewhere, but it's not a JSON file anymore. And you do a round trip to that file where you get the index block, and the index block is the last, say, 32 kilobytes of that file. You download that in another round trip, so you're at two round trips now.

Aaron Francis [46:06]:
And why that decision? Are those files so big that it makes more sense to just get the index, calculate what other range you need to grab from that, and go back and grab that range right back?

Simon Eskildsen [46:56]:
That's... those files can be gigabytes large.

Aaron Francis [47:00]:
Gotcha. Okay, that makes sense. And then, so as you've described it, we're still just operating with, you know, turbopuffer is going straight to S3. And so then do we put those other stops in the middle, and you just have intelligent caching along the way so that you're not grabbing whatever was formerly centroids.json off of S3? You're not grabbing that off of S3 anymore. That one's probably hot all the time. The metadata is probably there all the time. And so you can just get that directly super fast, and then the caching layers come into play. Is that correct?

Simon Eskildsen [47:29]:
In the simplest form, I mean, the caching is ever-evolving, and the smarter the caching we make it, the better performance our customers get, right? You can imagine that caching that small LSM.json is really useful to have in cache almost indefinitely. That saves you that first round trip for a very small file. So we could prioritize things like that in the cache, but in the simplest implementation of the cache, you can essentially just imagine that the NVMe SSD, we create one file that's the size of the disk. You can imagine just as cache, that's just the name of this file, and you just put the keys in, and you just load it in as a ring buffer. So it's like, you know, you get one gigabyte, you put it at the beginning of the file, another gigabyte, put it at the beginning of the file, and you just do that. And then at some point, it wraps around, right? You just start overriding from the beginning. This is a very great way to use a modern disk because you're just writing as fast as possible. You could do lots of heuristics where, like, when you wrap around, right, and sort of the... it starts eating itself, you could start to say, "Well, actually, this is accessed a lot, so let's not override that." You could do all these kinds of things, right? And make it arbitrarily complicated. But turbopuffer, like, turbopuffer just shipped in the beginning with just the ring buffer. It didn't do anything smart, and I think Justine had just had a 200-line implementation of this, and it was rock solid.

Aaron Francis [48:53]:
That's awesome. That is so cool. I love a simple solution that just works all the time. So what are the... you talked early on about the turbopuffer website, or rather you talked about the fashion websites of databases in the tens, right? And how they just like don't really tell you what's going on, like you just look sexy. So what are the trade-offs of turbopuffer? What are the decisions y'all made that you want to like illuminate? Because obviously it's a very different mental paradigm than something like a MySQL, Postgres, SQLite, anything like that. So talk to me about those trade-offs and what decisions you made to make this search case actually work.

Simon Eskildsen [49:40]:
The biggest thing is that turbopuffer has very high write latency. If you write to a MySQL... your write latency is essentially whatever an fsync is. On a modern flash drive, you can do an fsync in a couple hundred microseconds, maybe a millisecond if it's a slow disk or network disk. That's pretty fast. And MySQL and Postgres and so on, you can even get it faster than that because of a variety of things that they do. But generally, it's sort of in the hundreds of microseconds ballpark. turbopuffer is two orders of magnitude slower than that. Our writes take in the low hundreds of milliseconds because that's the latency you get to S3. If you were building the Shopify checkout with hundreds of milliseconds of latency on every write in the checkout path, it's just not going to work. Like, you're just not going to buy anything. And that's the fundamentally biggest trade-off that we make. The other trade-off, which is more of a trade-off with the current implementation of turbopuffer than a fundamental trade-off, is that occasionally you will get a cold query. Occasionally, you will do a query, and it will go directly to object storage and then puff into cache. That doesn't have to happen, right? We can guarantee that there's just always hot on at least one or two nodes, and you could do replicas and things like that, and we want to allow users to control that type of thing down the line. But it is a fundamental trade-off that you make in turbopuffer right now, which means that your tail latency could be close to a second. There's lots of things you can do when you open a Q&A pane in Notion; they send the request to turbopuffer to start warming the cache before the user has started typing in again to get around these kinds of things. And we try to always improve the intelligence of the cache, right, to make sure that it's evicting and keeping things in cache that are used. Those are the two fundamental trade-offs: high write latency means that very complicated transactions and things like that, it's just... you're just not going to do that. But that's the fundamental trade-off.

Aaron Francis [51:43]:
I buy that. So is cache warming a primitive you expose? So like if I have... we'll go back to music and say it's like, you know, post-rock shoegaze or something that doesn't get pulled into the cache very often. Does Notion, for example, just send off a query for post-rock shoegaze, or is that a primitive you expose? It's like, "Hey, warm up the cache somehow, some way."

Simon Eskildsen [52:05]:
Yeah, we expose hint_cache_warm—basically a hint to prefetch hot keys, like madvise for our cache. If it's already in cache it's free; if not, we charge you for about one query.

Aaron Francis [52:12]:
Love it—that's a real primitive. So S3 has dozens of competitors, some very popular, some that are just upstarts, but everybody claims S3 API compatibility. That doesn't super matter for y'all because it sounds like the underlying part is what super matters for y'all. Are there any other object stores that offer the same kind of performance that you've looked at that you're like, "Hey, maybe this is an option that maybe has better performance or different performance"?

Simon Eskildsen [52:55]:
turbopuffer started on Google Cloud Storage because Google Cloud Storage had compare-and-swap before S3 did. I don't know when they got it, but they had it in 2023, and S3 didn't. So we started on Google Cloud Storage, and we were so adamant of not having a metadata layer, so adamant that in the beginning when we sold, Notion is in AWS, and we were in GCP, and we bought like dark fiber interconnects with GCP because we needed this compare-and-swap primitive so badly, and Notion was like, they understood. So, and unfortunately in Oregon and US West 2 or 1—whatever, doesn't matter. Those two data centers in Oregon are physically around five milliseconds apart. Like if you'd like have dark fiber, like in reality, it should be less than that, but let's say five milliseconds. But they were showing up as 20 milliseconds going through Seattle. And so they're both like one, I think AWS is like in some suburb of Portland, and the other one is like, like the GCP one is like out in the boonies, and they were going through an exchange in Seattle, and it was like 20 milliseconds. So we bought like... we bought through an exchange in Portland, setting all this up. Like these were the lengths, and we were tuning. We were tuning so much TCP because TCP by default, when you request data, you only get about 15 kilobytes of data, and then you have to wait for a round trip, and then another round trip, and then you get double, like this TCP slow start. And so we were tuning that like on both ends. We were setting a... like we did all this crazy stuff because Justine and I have been on call for systems where downtime costs customers millions a minute—and we were not going to hand-wave tail latency on top of object storage. So we bought dark fiber, tuned TCP, did whatever it took to make the path fast enough. Now, of course, we don't have to do this as much because we also run on AWS on S3. But to go back to your question, there's Google Cloud Storage, there's S3, right? There's Azure Blob Storage. We're starting to work with customers in Azure. Sure. I don't have too much to say about Azure Blob Storage yet, but between GCS and S3, my main observations are that the tail latency on S3 is higher, but the small object latency on S3 is lower. It's about eight milliseconds. If you're getting a very small object from S3, it's faster than GCS. GCS for small objects is around 15 to 20 milliseconds, but the tail latency is better on GCS. Other than that, they're very comparable systems in our eyes.

Aaron Francis [55:25]:
So you've already gotten to the other big clouds, so that would leave, I guess that leaves many. I know DigitalOcean has a compatible one, Cloudflare has a compatible one. I just talked to Tiger Data, and that'll be coming out, at this point that y'all are listening to this, that'll have been out for a week or so. So that's another one. But it sounds like you've got some of the big ones that your customers are looking for right now. How hard is it to maintain? How different are those primitives? And do you just have like, you know, adapters where in somewhere in the code, it's like, "Hey, I'm running on..." and on GCP, do this one thing slightly differently.

Simon Eskildsen [56:01]:
Yeah, we have our own client that we've written that works with all of our layers, and it's... I think you want to own your own client when you're doing this kind of stuff. There's a lot of things that you have to do, like signing the request. That was two days of my life that I never get back. And there's a lot of small minor differences, but there's just if and else statements. But Azure Blob Storage is particularly annoying because they do not implement the S3 XML spec, so you have to do something completely different. And in terms of others, it will come from customer demand, right? If customers are asking for them, we will. Originally, turbopuffer only worked on top of Cloudflare R2 and Cloudflare Workers—believe it or not, that was the absolute first version. There's probably still some WASM-era code sprinkled around the codebase. That's actually why I had to write my own S3 client: the existing ones didn't work well in WASM. But I couldn't get it fast enough there, so I moved on to this architecture.

Aaron Francis [57:06]:
So that's up to four major clouds that you have at least at one point written it for. That's kind of wild. So where do we stand right now? So turbopuffer, who do you serve? Like what workload, use case, business type do you serve the most? And what do you see in the next six months, a year for turbopuffer? Where do you want to go?

Simon Eskildsen [57:30]:
We just want to get the whole world puffing—I should be hitting a vape on this call for full effect. Anyway—look, you can compile all of the world's knowledge into a couple terabytes of weights, and that model is going to have a very good idea about how to reason with the world. But in order for a model to reason with the outside world or the inside world of a corporation, it needs to search. That is the most important tool that it has in its arsenal. And turbopuffer is going to index all of the private data in the world that wants to be connected to AI. So when we say that people want to connect data to AI, that's what we do. We also connect humans to data at scales that were otherwise before very, very difficult. So we see that in like so many different types of businesses, right? There's code, there's legal, there's hedge funds that use us, there's all kinds of businesses that have massive amounts of unstructured data. And that unstructured data, everyone is trying to get value out of, and that's what AI is so good at. And it's also what search is really, really good at sifting through. And so the way that I think about it is turbopuffer is you give turbopuffer the haystack, and we will give you the chunk of hay that the needle is in. But the LLM is very good at then using the needle and using that chunk of hay and getting value out of it.

Aaron Francis [59:00]:
That's a fantastic analogy, by the way. That's great. Okay, that makes sense to me. All right, so what do you see like product-wise? Give me a sense of where turbopuffer is at and any sense of scale you want. Employees, anything public you want to share, like give...

Simon Eskildsen [59:16]:
Yeah, we're just shy of 20 people. We are a very focused team of people who just love to build databases and people who want to love helping people build amazing things with those databases. And those are the kinds of people that I think would love working here and are working here now. We power more than a trillion vectors. I haven't heard of anything else than that. I'm sure the hyperscale is about that scale, but to give you a sense of scale, the entire public internet is in the hundreds of billions, depending on how you do it. So this is extremely sizable scale. At peaks, we do more than 10 million vector writes per second, and we do tens of thousands of queries per second, which to me is not as impressive. Like, you know, having worked on large MySQL clusters that do millions per second. So these are the kinds of numbers that we operate at. So this is real scale, like it's like on some attributes larger than the scale that I was at that I've worked with Shopify. And yeah, we have many, many, many customers that work with us and trust us.

Aaron Francis [1:00:28]:
Okay, so you're two-ish years old, 20 people doing huge scale. I want to talk about where you're going, but just on the employee thing, there are a lot of nerds listening to this. Are you actively looking for people? And if so, what types?

Simon Eskildsen [1:00:43]:
We are always looking for people to join the team. We're looking for customer engineers. We're looking for sales. We're looking for database engineers. We're looking for people to work on the dashboard—more or less every role for someone who wants to help build a pretty database.

Aaron Francis [1:00:58]:
Love it. People can track you down if they're clever—but they should really go through jobs on the site, right?

Simon Eskildsen [1:01:10]:
You should be able to find me, and I think you will have a higher probability of getting a great answer if you go to jobs on the website because then it doesn't land in my inbox, which is getting a little bit hard to stay on top of.

Aaron Francis [1:01:23]:
Fair enough—you all heard that; hit the jobs page if you're interested. What's the roadmap for the next six months or year? You want to get the whole world puffing—are you mostly refining and hardening the core, or is there more search-adjacent surface area you plan to pull in?

Simon Eskildsen [1:02:03]:
Look, we want to... right now we are focused on building the best search engine and making that scale. So we work a lot on performance. We look closely at our customers' query plans and expand them. We are very focused on more full-text search features. One of the people we just hired has been committing to Lucene for more than 10 years, and we're working on just adding more and more text features to the product. We are working on puffing up the dashboard, so all the UX around using the product right now, if you log into the turbopuffer dashboard, you will feel like this was vibe coded by a 14-year-old. And that's right, that's because this was sort of written the initial version by me over a few months when I was early on at turbopuffer, and then an amazing support engineer has worked on it since in between answering customers. But it's starting to get some love, and I think people should get really excited about that. But it's really just we want to build a really good search engine, and of course we want to expand into helping with many of the workflows around search, but we really are focused on creating an incredible product at the core before expanding with more auxiliary offerings around it.

Aaron Francis [1:03:24]:
Okay. Yep, that makes a lot of sense to me. This has been great. You're really good at explaining all of this highly technical stuff. So well done you. As we wrap here, tell the people why they should consider turbopuffer and when they should consider turbopuffer. This is your like no holds barred, give us the pitch.

Simon Eskildsen [1:03:50]:
If you are searching data and you don't have that much data, then you should not be using turbopuffer. You should just do whatever you have right there, put it in pgvector or whatever you're already running—until scale and economics push you toward a specialized search layer.

Faster vector search

Transcript

Guides

API Docs