Latent Space

December 30, 2025

[State of AI Startups] Memory/Learning, RL Envs & DBT-Fivetran — Sarah Catanzaro, Amplify

The Data Stack Isn't Dead, It's Just Training Models by Latent Space

Author: Sarah Catanzaro

Date: [Insert Date Here]

Quick Insight: Sarah Catanzaro explains why the DBT-Fivetran merger signals a pivot toward IPO readiness rather than a category collapse. Builders must move from human-centric discovery tools to machine-centric governance and memory systems to survive the next phase of AI.

💡 Why are frontier labs still using traditional data tools for new training runs?
💡 How does the $100M seed round trend distort the actual value of startup equity?
💡 Why is memory management the missing piece for AI application retention?

Sarah Catanzaro of Amplify Partners joins Latent Space to dissect the messy intersection of data infrastructure and the AI gold rush. She argues that while the "Modern Data Stack" is evolving, the real value lies in solving the gnarly research problems that enable stateful, personalized agents.

The Data Infrastructure Persistence

"The merger was a way to accelerate that path to liquidity... they were the presumptive winners anyway."

Training Data Management: Frontier labs require structured pipelines to handle massive datasets. The tools that powered BI are now the backbone of model pre-training.
Machine-Centric Metadata: Data catalogs failed because they targeted human discovery instead of machine governance. Future winners will build metadata services for agents and microservices.
Predictable Workloads: Traditional analytics were deterministic and easy to optimize. AI workloads are ad-hoc, requiring a new breed of learned optimizers and efficient GPU data loaders.

The $100M Seed Delusion

"Valuation until a company exits is an entirely made-up number."

Transactional Funding: Founders are raising $100M seeds without 24-month roadmaps. This creates a dangerous disconnect between capital raised and actual technical milestones.
Equity Mirage: High valuations are used to lure talent from big labs. If the company spends the cash but exits for less than the preference stack, the team gets zero.

The Quest for Stateful Agents

"The best RL environment is the real world."

Memory Management: Current AI apps suffer from high churn because they lack persistent state. Personalization is the only way to move from a toy to a durable utility.
Continual Learning: Models must learn new skills from user interactions in real-time. This turns inference into a stateful systems problem rather than a static API call.

Actionable Takeaways

🌐 The Macro Pivot: The transition from stateless chat interfaces to stateful, personalized agents that learn from every interaction.
⚡ The Tactical Edge: Prioritize memory. If you are building an application, treat state management and continual learning as your core technical moat to prevent user churn.
🎯 The Bottom Line: Stop chasing clones of existing apps for reinforcement learning. Use real-world logs and traces to build models that solve actual engineering friction.

Podcast Link: Click here to listen

Okay. We're here with Saro from Amplify. Welcome. Thank you. First time to be here. I know. We've known each other for so long. Yeah. Never made an appearance. Uh and also made the transition from data to AI. I guess I don't know if I did. I don't know if you were always like as deep on AI. Um, but obviously there's a lot of sympatico.

Yeah, I've always actually kind of oscillated between data and AI. Sure. Um, like arguably I started my career in quote unquote AI. It was just more like symbolic systems back then, but as you said, I think like they're they're so symbiotic like it it's almost hard to divorce them. That's actually what brought me into data. I was like, I want to better understand what happens when I write a SQL query. So yeah. Um let's briefly touch on data because I I think obviously that's that's a lot of where you and I first met. Uh DBT5 trend that was so cool. I mean or Yeah. How do you how do you think about the end of the modern data stack?

Okay. So so like a lot of people look at the like DBT5R uh merger and like talk about the end of the modern data stack and I think that is like a fundamentally wrong take. Both of these companies were growing, you know, very healthily. Both of these compan you funded DBT? We funded DBT. So, so like both of the companies were actually like beating their revenue targets. I think what you're more seeing is a you know IPO environment wherein companies are expected to have far more than you know like a 100red million revenue.

And so what would you say the bar is now? 300. No, like above 600. 600. Yeah. Yeah. and the combined company is 400. I believe that they'll actually be close to 600. I don't have the exact number, but they clearly just getting ready for IPO. So, so, so you know, basically like the merger was a way to accelerate that path to liquidity, you know, as you might remember and they were the presumptive winners in their categories anyway. So, exactly. Exactly.

Um you know I think one of the things that has actually uh pleasantly surprised me um and this speaks to again the symbiotic relationship between you know data and AI. Many of the big frontier labs are actually using both DBT and Fiverr. I recall talking to folks at um thinking machines like within weeks of the company's formation and DBT was already an important part of their stack. It was certainly like training data sets need to be managed. We need insight into what users are doing on these platforms and in fact like the way in which you would analyze interactions with an agent or analyze interactions with an LLM is even more complicated and so know while I think perhaps like uh the demand for analytics engineers the demand for data scientists uh didn't explode in the way that some people thought like analytics engineers are not one-third of personnel uh that doesn't actually mean that the demand for the tools uh is not still like very prevalent.

Well, you got what you wanted. You wanted to democ democratize things. You got it. Yeah. Yeah. I mean, I guess we democ we we we we democratized things by uh perhaps reducing the need for the people. I don't know whether or not that is a good thing, but honestly, I do think that like the fact that uh it is easier than ever from a tooling standpoint for people to make datadriven decisions is probably a step in the right direction. Um, and I've become actually convinced that like while every company does need analytics engineers and does need data scientists, they probably don't need armies of them. Um, and probably having like a moderatelysized data and analytics team is a good thing.

So, you touched on an interesting thing I wasn't planning to ask, but this is interesting. So, I come from the data field. Data was synonymous of analytics. Yeah. But you're now saying that the DBT5 trend are being used for training data. Is there any notable differences in the workloads or the requirements?

Undoubtedly there will be. I mean I think one of the things that we saw with uh analytics that you know was surprising to some of the people in the data infrastructure space was that like the workloads were actually quite predictable. uh they were quite predictable because like many of them were actually not being generated uh by humans but rather by deterministic systems. So like a lot of it uh was you know like BI dashboards that are you know Tableau that is actually hitting your database or maybe not Tableau but like uh looker um or you know hacks or something like that.

I think with uh like analyzing, curating, preparing data sets, it's a bit more ad hoc. Um and so undoubtedly it will be uh less predictable. I don't know if that really changes the way that we approach developing data infrastructure. I talked like some people are quite interested in still in like things like learned indexes, learned optimizers, and it's a bit easier to build a learned optimizer. uh if you have more predictable workloads and so it could change the way that we approach things like that.

Yeah. Data cataloges do they become more important? Are they transfer? Oh man, like straight to the gut. So So that was something I got wrong. I I'm sorry. I don't know the background. What What did you I I I just I really believed that data cataloges were going to become an important part of you know the modern data stack. Um and the players are Atlan. Uh I those she's she's Singaporean so I Yeah. Yeah. Uh there there was uh girl data world metaphor within our portfolio. They've all struggled as a category. They all have struggled a bit as a category. Uh many of them have been you know acquired subsequently uh which suggests that like this was not you know perhaps a a standalone category. as a data scientist like I spent so much time working on data cataloges. Um and so you know I kind of felt like this was like this was the thing I wanted like I didn't want to have to like build the Yeah. Or to the point also like pre-training data you have a lot more heterogeneous data all over the place.

Yeah. And like you you need to keep on top of it and and you need to make it discoverable accessible and all that. So what didn't it work? So so I think there were a couple of things. I think we have seen some consolidation in uh the you know modern data stack uh particularly around you know some of the key components whether it was you know fiverran or dbt or uh you know hex or you know snowflake um many of these products offered kind of like data cataloging uh capabilities as a feature and I think for humans that was good enough like the the data catalog that you had available in snowflake was good enough uh the data cataloging capabilities available in DBT like th those were good enough they did um DBT like obviously as they didn't build the cloud they were going to build it yeah like what else do you do I mean it's actually funny in fact uh my colleague Bar at Amplify was uh the like product speed on the these kind of like metadata services.

Um I think it's still not obvious to me but I think one opportunity that might have existed andor could have been realized was the opportunity to build data cataloges not for uh humans but you know for you know machines this would look a little bit more like you know metadata services um I don't just mean for agents although I think you know that opportunity is arising more but even uh like microservices and things like that um okay yeah so so I do wonder at times like if we built data cataloges for the wrong people uh and potentially even you know for the wrong use cases like I think a lot of uh data cataloging companies ended up focusing on like discoverability when perhaps like the real market opportunity was in governance uh governance very important.

Any other comments just about what you know so far about the data stacks of the large labs you know I guess obviously a lot of data people might who might be listening would want to sell into them yeah I mean a couple of observations one is that you know they are actually paying careful attention to their data stacks. I think they're thinking about you know problems ranging from you know data discoverability to uh data preparation to even things like the efficiency of data loading like if you're unable to load data to a GPU efficiently then the GPU is going to sit idle and that's going to be a kind of like a cost. Yeah. Yeah. Exactly.

Um so what what solution is channels that I don't actually I mean I get to to talk about yes exactly a plug my portfolio companies uh we have a portfolio company called spiral that has developed a file format called vortex um uh and uh they make data loading like super efficient um specifically to GPUs specifically to GPUs. Okay. Yeah. Yeah. Good to know.

One of the things that has surprised me though is actually that like so much data infrastructure has actually scaled quite elegantly to meet the AI use case. I you would hope you you would but like like uh the scale of these AI companies it's incredible. Uh it's not as big as ads. Maybe, maybe. Yeah. I I I think that could change, you know, like a a as agents actually become kind of like more prevalent and are interfacing with each other and therefore like perhaps like the number of transactions explodes. I have a friend who works on transactional databases at OpenAI. And I was like, so you must be like building databases like this is like a paradigm shift in terms of like the scale that like databases are like going to need to handle. And he's like, no, we use Roxet like this one, right? Yes. Exactly. Yeah. Yeah. Um, very cool.

Okay, let's just talk about funding environment because obviously that's that's like a a big theme this year. What comes to mind in terms of looking back at 2025? Uh, what stands out? It was crazy. Um, yeah, you can give anonymized examples of like what what what does crazy look like?

Yeah. I mean, I think crazy looks like uh raising upwards of a hundred million dollars seed like upwards of a hundred million in a seed round where you have a long-term vision but not a near-term road map. Yeah. Uh this is something that I'm seeing happening not just occasionally but quite frequently. Yes. Um and it definitely makes me anxious because firstly like when founders are asking me you know how much should I raise I'm typically saying like three like five well like what do you need to do like what are your milestones for the next let's call it like 12 to 24 months uh what resources do you need in terms of you know headcount compute uh equipment to uh unlock those milestones and then like maybe add like a 20% buffer or something like that.

Um, but doing that analysis requires you to like understand what you're going to build in the next zero to let's call it like 24 months. But I've talked to some companies and they're like, we're building a frontier lab for X. And I'm like, okay, cool. Like I get the long-term vision. There is an opportunity to uh, you know, make AI more secure, make AI more humane, uh, make AI more data efficient, whatever it might be. Um, so, so like I'm bought into the long-term vision and that that, you know, for me as an investor is super important. Like, so let's talk about like what your team's going to work on in the next six months. They're like, uh, maybe we might build a consumer app. Like, uh, you know, we're I feel like I know exactly the company you're talking about, but but but like I wish I was talking about like one specific company. I'm actually talking about like several companies.

Um, and look, like I'd be a hypocrite to say that like I've never done investments like that. Uh, but I've done investments like that when like I really know the people and I'm like they're going to figure it out. What is frightening about this funding environment is that you meet a founder, they're like, I'm raising, you know, hund00 million. I'm raising like a billion dollars maybe at times. Um, and you need to make a decision in seven days and I can't tell you what I'm going to do for the next six months. conviction. I think what some of the founders are missing is like you only have seven days to get to know me. If you haven't figured it out, like you probably want a partner who's going to be working closely with you to help you figure it out.

I mean, they're absolutely viewing it as transactional, right? Like they don't care. No, they care about, you know, the most money at the highest valuation. I mean, the crazy thing is that they don't even seem to care about dilution. It's just like the most money at the highest valuation. Yeah. And and but you know, it does send a signal that helps. So, so I mean I Yes, I think it does right now send a signal. Okay, I'll tell you how how it it affects me and I hate it. I hate it. All right. Antithesis came out of Stealth this week, right? And the like the only thing I know about them is they do something something in AI testing and James Street led a seed round of $100 million. We they invested it in it too. I can tell you what they do but they they do the permanistic simulation. The thing that is the the lead is the money.

Yeah. And then like okay well who else uses it other than Jean Street? Like what do you do that's innovative? Palanteer. Okay. Warpstream. Yeah. So yeah. Okay. Anyway, so so um maybe maybe antithesis is a bad example because they're actually legit, but like uh you know there's there's a lot of similar examples where uh they just lead with the money and like there's no no much substantiation behind it. Maybe it's just bad storytelling and that's why I as a podcaster get to talk to I just talked to uh general general intuition and like once you spend some time with them then you're like oh okay this is why they raised $100 million but like without that context it's like really hard to understand anything.

Well, and and like I think there are some companies that are raising, you know, a hundred million dollars or more because they need it. Like a good example might be like periodic in addition to, you know, yeah, they need to build out a wet lab and like designing a wet lab that can support high throughput biology, which is absolutely critical to, you know, their goals. Uh that's costly. So, so, so like I understand why they need that that funding, but again, there are others where like they don't have these near-term milestones. I think the thing that is a little bit, you know, perturbing to me, many of them are doing it because it makes it easier for them to hire because, you know, there are all of these candidates who like want to be want to work at a company that is like a unicorn or a near unicorn.

They're pitching because the alternative is work at a big lab where, you know, it's the prestige and the money is there. Yeah. Well, or the alternative is like work at like an early stage startup. Uh but but but but like there's something about like the big valuation that becomes enticing. They're also kind of uh pitching candidates. They they have a compelling equity pitch where they're like, "Okay, maybe you're getting, you know, less than uh 0 like uh 1% of the company, but like given the valuation, uh the value of uh your equity is already, you know, like $10 million or something like that." And and they also uh guaranteed a dollar value the equity. Yeah. You you mean that like they'll offer them a loan to to pay uh a buyback uh if if if it goes um Yeah. if you want to sell it.

But but but but because they have so much cash like but the thing though is that like the valuation is a madeup number. Like valuation until a company exits it is an entirely madeup number. So like I could just be like uh you know what the latent space pod that is worth $5 billion and we could agree like uh we like I as an investor could say like that is the price and now now the company is worth $5 billion like do you think that like if you were to to yeah it's not real it's not it's not actual uh acted in any volume. Um and given the the funding amounts that they're raising too like if they spend that and they you know get acquired for less than uh that amount then like their teams are getting nothing. I wish people were kind of like more sensitive to this dynamic and thinking more about like what is the upside associated with the company and you know more fundamentally like do I deeply believe in this vision because I think like uh joining companies because like they have a billion dollar valuation it's just it's not the right way to choose a job. I hear you.

Okay. So there obviously we can go about that forever. Yeah, there's and there's there's a lot of there's also some stuff with like cyclical uh funding and all that stuff, but um I I I do want to be more relevant to engineers and researchers. Yeah. Uh what is what are the the themes that are that are really strong like so one one thing I'll point out is world world models? Oh yeah, just in general are a really strong bet. I would say uh so I have a every new site go to this like group of researchers and we take a vote on the top thieves of the year. Everyone's extremely skeptical about world models. I think it's a trailing indicator because LMS have been so enormously successful. You're like, I don't need anything else. I don't know if you ever take on world models or any other top theme of the year.

My like take on world models is that like we have not yet defined like what a world model is. Oh yeah, there's like three definitions right now. Yeah, I think there's a lot of confusion about like what a world model is and therefore, you know, what it should be used for. uh we're already seeing, you know, plenty of like market potential for video models, including for things as like perhaps like Bayol's like uh video editing. I think, you know, we're already seeing some applications of world models to things like autonomous driving and potentially even coding. But again, it really hinges upon like how are you defining world models? And I think one challenge that people have seen is that like world models perhaps designed for uh one specific use case might not generalize to others. So as an example of this like world models for uh like video game generation might not like generalize to like factory settings or or robotics. Um I use the word might like strategically because I think like it is potentially a research problem that might be figured out. Yeah. So that's part of the general intuition podcast that we did that they had some evidence. Yeah. I think like it is possible. It's just we're not there yet today.

Yeah. A theme that I've been spending a lot of time thinking about is memory management and continual learning. I work with a lot of same startup think. Okay. I I I think I know what startup you're you're thinking about as as well. But I actually like I see I see like a lot of market potential for uh uh memory management and continual learning. Uh my interest in this is actually more driven by conversations with uh practitioners. Personalization is so important right now. I think what we're seeing is that like a lot of AI application companies, they're growing really quickly, but they suffer from you know relatively low retention, relatively high churn. So you know if you're developing a app like cursor how do you ensure that your users don't you know switch over to uh uh you know windsurf yes uh or you know cloud code or cognition or or uh you know whatever else.

Yeah. Cursor rules isn't enough right? Like it's it's like the shittiest form of memory. Yeah. You know and it's great. Um but yeah I I agree with that but also it's like as a I' I've publicly mused about this before where like uh memorization memory is very uh poorly implemented today in a lot of surfaces like even chat GBT I wouldn't say like people are particularly excited about it. Okay. All right. You feel stronger about it than I do. Yeah. Yeah. I mean I I I wish chat GPT had you know much better. Yeah. Like has this been the leading one? I don't know. Um, so and then I think like just in general it makes product management harder because what is the product? It's a combination of you plus memory. And like when you have a bug, is it the memory or is it something core? Um, and and that's as a user, especially if it's consumer, it's there's going to be zero patience for any of this.

I agree. But that said, like consumers seem to be like tolerating products with like no implementation of memory today. So I think early better is still probably better than like what what what exists now. Better is better than nothing. I guess would you agree with the statements that basically let's say a key theme of 2026 is this personalization. I would call it kind of like the consumerization of uh AI uh in the same way that consumerization of enterprise was a trend like 10 years ago. Yeah. I mean I think that is a good way to putting it too like I don't for for what it's worth think like this is just a like consumer or proumer phenomena. If you are in an enterprise that is adopting again like uh Devon or augment or something like that, you probably also want your models to kind of like learn the like.

Yeah. Like you start to uh like K factor. I had to explain what that is to so many founders and you know like this these like if you're in normal SAS this is what you obsess over and to AI founders they're like what do you mean growth just doesn't just show up like Yeah. Yeah. I mean it has though. But but but I think like it has because uh for a while uh you know AI has just felt magical but like now we're getting more accustomed to the magic and it's no longer enough. And I think you know we need to uh revert to some of the like old tips and tricks for retaining people and you know bringing bringing them in. Personalization is one of them. Um, I always kind of intermingle like memory and continual learning because I think like one interesting element of personalization is not just learning know facts about your or your preferences but like actually learning new skills from interactions with you and you know learning as the world changes like there are new versions of uh languages and frameworks and you know other repos that are coming out all the time.

The world is changing all the time. Human intelligence is incredibly dynamic and yet like uh artificial intelligence is just so static today. But like so it must update weights for you. But but but that also means that like it's an interesting kind of like systems problem because like if you must update weights then like you know weights become stateful and today like inference is not stateful. So so you know I think I think there's going to be like a lot of kind of fun gnarly problems to figure out as we figure out things like personalization and continual learning. That's also a fascinating infrastructure problem because you have to load and unload and uh you know cache and all all the all the good stuff.

Um one more thing I think we have time for one more take uh on RL environments huge topic. Uh is it just a docker container with some custom software loaded and logging stuff out. What are the good ones like and what what are the average ones like?

So, I know I'm going on record on this and like I'm actually okay to be wrong, but I think our environments is just a fad. Um, oh god. Oh no. Uh, they all they're all fake. I mean like what I mean people like okay the the thing that I makes me take it seriously. The labs I know are paying seven eight figures for our own environments for other like and they could build it in house. They're not and I don't understand why. I mean they were paying seven to eight figures for like piss poor data annotation too. Yeah. Uh so like and then data labeling before like the labs have a lot of money. I think perhaps like a environments could create some value in the short term but I think to to the point about like what makes a good oral environment what makes a bad oral environment. I think the best oral environment is is you know the the real world. Um why would I you know want to uh buy a Door Dash clone when like I can just use uh logs and traces from you know Door Dash itself. It doesn't mean that we don't need to in parallel.

Yeah. I mean I think like using the real world using real apps as like RL environment is in fact like the best thing and this is what cursor does like they actually do use uh you know real user activity on their platform to you know significantly like improve both their coding agents as well as tab and I think it's one of the the approaches that has like made the platform so compelling it doesn't like you still need to figure out like the right rubrics you still need to figure out like the right set of tasks uh so So there are some aspects of oral environment design, you know, at least as we're talking about it today, uh, that I think are going to remain incredibly relevant, but like just building a clone of an app, I think is not that useful. Yeah. Yeah. Okay. That's that is all I'll take. Um, we have maybe three minutes for any other stuff that you think uh about just the state of startups in general, state of funding.

Yeah. I I so so maybe I can talk about like just the archetype startup that is like most exciting to me. Yes. I for startups. Yeah. Yeah. I love investing in you know infra tools platforms etc. And as we talked about with continual learning I think like there will be opportunities for like new tools platforms and infra in the future. I've spent a lot of time thinking about like applications today. Um and specifically like the relationship between research and applications. An example of this is like I think there were a lot of advances in rag and the biggest beneficiaries of these advances were the application companies for whom you know retrieval was a critical unlock. So as an example of this you know like uh Harvey Haba I knew you were going to say Harvey. Yeah I mean they they have like really interesting rag implementations. they have hired researchers like really good researchers uh to kind of advance the state-of-the-art and that enables them to build a better product. I feel this way very much about like rule following and customer support. Rule following is like a hard research problem but if you solve rule following then you unlock you know better customer support and I think uh a lot of uh Sierra's success can be attributed to like their focus on this.

So I've been thinking about like uh even for something like continual learning or memory, what is like the killer use case where you can either offer a dramatically better experience by having a good memory implementation or you can do something that was just not possible today. I think you can also think about this in the inverse like and often the best companies emerge in this way. They're like, "I'm trying to do this thing, but in order to actually do it, I need to solve this hard technical problem." Uh that that that's kind of like the story of Runway. Uh I don't think they would have built models if they didn't have to. Uh but I love that that combination of like we're delivering something that is like better for consumers, better for consumers, better for users. Uh but we're doing so by solving these like really gnarly research and engineering problems. Yeah. I I don't want to um Yeah. God, there's so there's so much that that I want to sort of dig into there, but we're short on time. Uh, thank just thank you in general. Um, I don't know if I don't know if you have like a a general call to startups for like a page somewhere that you can want to point people to. Uh, Twitter, whatever it's called. Uh, yeah, you can find me. You can find me there or in South Park with the oneeyed dog. I'm easy the spot. A Okay. Well, thank you so much for your time. I know you got to go, but uh, appreciate it. Of course. It was great seeing you and thanks for having me. Yeah. Thanks.

[State of AI Startups] Memory/Learning, RL Envs & DBT-Fivetran — Sarah Catanzaro, Amplify

The Data Stack Isn't Dead, It's Just Training Models by Latent Space

The Data Infrastructure Persistence

The $100M Seed Delusion

The Quest for Stateful Agents

Actionable Takeaways

Others You May Like

Inside the economics of OpenAI (exclusive research)

Inside the economics of OpenAI (exclusive research)

Inside the economics of OpenAI (exclusive research)

[State of AI Startups] Memory/Learning, RL Envs & DBT-Fivetran — Sarah Catanzaro, Amplify

The Data Stack Isn't Dead, It's Just Training Models by Latent Space

The Data Infrastructure Persistence

The $100M Seed Delusion

The Quest for Stateful Agents

Actionable Takeaways

Join 10,000+ smart readers on our AI newsletter and stay ahead of the curve

Others You May Like

Inside the economics of OpenAI (exclusive research)

Inside the economics of OpenAI (exclusive research)

Inside the economics of OpenAI (exclusive research)