Turing Post
February 6, 2026

Why the US need Open Models | Nathan Lambert on what matters in the AI and science world

How Open Models Solve AI's Geopolitical & Innovation Dilemma by Turing Post

Nathan Lambert, a research scientist at Allen Institute for AI and a vocal open model advocate, cuts through the hype to reveal the strategic importance of open models. He argues that while closed models grab headlines with cutting-edge performance, open models are quietly laying the groundwork for the next decade of AI innovation, particularly for nations aiming to secure their technological future.

Quick Insight: Open models are the unsung engine for future AI research and a geopolitical imperative for the US to maintain its innovation edge. They offer a transparent, cost-effective path to AI adoption for enterprises and global economies, despite lagging frontier closed models by months.

  • 💡 Why are open models crucial for US AI leadership: even if they lag behind frontier closed models?
  • 💡 What specific technical and economic challenges do open models face: and how are they being addressed?
  • 💡 How do open models fit into the broader geopolitical and societal impact of AI: beyond just performance benchmarks?

Top 3 Ideas

🏗️ The Research Engine

"They're going to be the engine for the next 10 years of AI research."
  • Academic Platform: Academia currently experiences a lull in AI influence. Open models provide a vital platform for scientific exploration and innovation that companies cannot nurture.
  • Knowledge Base: Think of open models as the academic textbooks and tools that empower universities and smaller companies to learn, experiment, and build foundational knowledge. This ensures a broad base of innovation and talent development, even if they do not invent the next big thing first.
  • Geopolitical Edge: China's rapid advancement in open models, driven by top talent and permissive licensing, positions them to capture future AI value. The US must intentionally invest in its open ecosystem to avoid ceding this critical influence.

🏗️ The Performance Reality

"The best open models are some 6 to 9 months behind the best closed models and that's fine."
  • Resource Disparity: Frontier closed models like Opus and GPT-4.5 are demonstrably superior, a gap largely due to massive compute and proprietary data resources. This means enterprises seeking peak performance often opt for closed solutions.
  • Costly Freedom: While "open" implies free, deploying and fine-tuning open models requires significant compute investment. This cost can be prohibitive for smaller entities, highlighting the need for efficient inference solutions.

🏗️ Strategic Imperative

"Owning the engine for AI innovation for decades to come and being the central source of influence in AI research."
  • Sovereign Control: Enterprises and nations increasingly want to own their AI stack for information security and predictable costs. Open models provide this control, reducing reliance on external APIs and mitigating geopolitical risks.
  • Frictionless Innovation: A robust US open model ecosystem removes friction for domestic startups and researchers. This ensures that the pipeline from academic research to commercial application remains within US control, preventing value capture by foreign entities.

Actionable Takeaways

  • 🌐 The Macro Shift: Geopolitical competition in AI is not just about raw model power; it is about who controls the foundational research and development platforms. Open models are the battleground for long-term national AI sovereignty.
  • The Tactical Edge: Invest in open model research and infrastructure, particularly in post-training environments and high-quality data generation. This builds a resilient, transparent AI ecosystem that can adapt and innovate independently.
  • 🎯 The Bottom Line: The US must prioritize open model development now to secure its position as a global AI leader, foster domestic innovation, and provide accessible AI options for a diverse global user base over the next 6-12 months.

Podcast Link: Click here to listen

They're going to be the engine for the next 10 years of AI research. I think pre-trading data is the hardest legal part to get open. The timeline on robotics seems too soon. Musk Industries, for better or worse, has a bit of villain vibes.

Hello everyone. I'm very happy to be hosting Nathan Lambert, research scientist at Allen Institute for AI, one of the best specialists, open model advocate and very well articulated educator and I think I was one of your first paying subscribers back in 2022. So, you can tell I'm a big fan.

Yeah, it's been fun. We've been crossing paths for years now on the web and in real life. So, it's fun to join the pod.

When you were starting your career, could you imagine because I was just watching your podcast with Lex Friedman and could you imagine that you would become a celebrity?

No. And I talk about this with my partner and family regularly where it's just like largely due to the dynamics of the AI industry, things have evolved so fast where there are just so many people who are very articulate and good educators that went to these labs to understandably go all in on building AI and part of that with the stakes involved like people at these labs don't tweet publicly that much open AI is a whole special thing but a lot of these communicators kind of can't talk so that's this void that I have been launched into and I think there is no way to actually prepare your life or approach where you go through levels of influence so quickly and now it's just like I have this and I have this ability and how does it change the relationship to what I work on and the problems that I approach to still have impact?

It's like I get invited to all sorts of fancy things and it's just like most of them serve no purpose and it's easy. It's like you just have to say no to them. But I have not fully grappled with what it means to have that ability. And I think in the last year I've definitely become more conscious of trying to help AI policy go well.

I think a lot of what we do building open models that are so well documented with care and explaining all the sides probably is most impactful on the policy side of things because open models is so geopolitical and how AI will diffuse through the world which is a very different track than what is so obviously in vogue right now which is that the coding Agents are becoming so good and so impactful in this acceleration that comes with it.

There's attempts to make it seem like these are linked topics where yes, the Chinese model players are releasing Agents as well, but I really think that they're they're not as linked as people think. And a lot of the open models are overhyped in their abilities.

And it's kind of a recurring thing where yes, it would be cool if the open models are better, but I just think that it's very different to sit down and use cloud code for open 4.5 or codeex than it is to play with an open model. That's not to say that they don't matter.

I think it's mostly I don't like working in geopolitics, but it's I would say maybe open models are such a case study in emerging technology and understanding how emerging technology influences the world and creates a new pockets of influence is interesting. But that's so new to me. I'm not trained in that at all. So I'm trying to learn.

That's exactly my question. If open models are not that good, why are you so passionate about open models? Why there are so many conversations about open models for the US? What's the reason?

They're going to be the engine for the next 10 years of AI research because the academia has been necessarily I was gonna say beat down which is not the right way to say it but academia is just in a lull of influence in terms of the evolution of the science to the point where people will say academic AI research doesn't matter right now which I think it's a bit shortsighted because it's going to be an engine for exploration in a way that companies can't really nurture but open models are the platform by which that innovation is happening.

And for a country like the US that has had such an excellent history of scientific projects and institutions if they want to be the institution of AI research they should consider it useful and imperative to have that open model investment be intentional and understood and just kind of something that they are in control of right now that influence is shifting to China where I think in both models and where research is done and shared it's just kind of a something that we cannot know what the future will hold but I would guess like is this a something you want to be in the position to reap the upsides of or China now has a very interesting ecosystem of open models and research and we don't know what will fall out of it it's just an unknown in terms of technological progression the US has the resources to own this and also such great academic institutions that want to be more activated and more involved.

So when deepseek happened when Chinese models started open models started just taking off one after another was it like pure geopolic ical just in the competition of with the US. What's the reason behind Chinese action?

I think deepseek was fairly ideological in it where they were in the purest scientific way of wanting to create knowledge and share lots of it with the world and share lots of the upsides. They kind of created an industry standard in China where deepseek was the spark that made Chinese companies interested in participating in AI and they saw that you could do this through open models.

So lots of them just consider that the default starting behavior. And if you talk to the companies, they're also extremely reasonable where they know that tech companies and potential customers in the US won't sign up for an API where data is sent to China and they pay. They've told me this. They're like, well, our only other option is open weights because then they could still most likely use it.

And they know that there's other second order concerns where IT departments will be like, well, is the open weight safe? But it's at least a card that they can play. And the companies realize this. I think they don't have a business model figured out any better than something like Llama would have had a business model figured out.

In the next 1 to 3 years, we'll see how funding continues to evolve for open models in the US and China. My hunch would be that the US ecosystem has a lot more liquidity to fund model trading efforts.

But then why are the Chinese open models so legitimately close to the frontier still? It's like like we don't really know. Opus and GGBT 5.2 2 are I would say clearly better than the best open models, but the best open models from China have probably exceeded my expectations and how legitimately good they are.

So there's guesses that you would have on the long term of the ecosystem and how much better the models would be, but there's also a lot of real information today that makes it seem like weird things might be going on that we can't quite easily tie a bow around.

Yeah, I just had a conversation with Miniaax researcher Olive Song and from the conversation with her, it just seems that they are shipping basically every month something new, some new version and the research just keep going, you know, nights, dates, weekends, it just doesn't matter. People work in shifts. I don't really see anything like this happening in the US, especially since Zuckerberg kind of backed off with Llama.

Who is the player on that level? like DeepSeek, Miniax, Quinn.

I think the characteristic is that the absolute best talent in China is doing this at this level for open models. That vibe is very similar to what anthropic open AI and all these like Gemini the vibes are like right now, but the talent density there is definitely higher than in open models which are often linked to academic projects like AI2 is obviously heavily influenced by Udub Percy Leang's efforts at Stanford. Some of this is a bit different talent pools.

I would say the closest thing that the US had is Nvidia's Neatron efforts which I think has made a lot of strides in the last 6 to 12 months where they've my read is that they've kind of figured out some of the internal team or and culture alignment which has let them put out a lot more in higher quality but it hasn't quite had the top end breakthrough that something like Quen or Llama has. So I think that they're going in the right direction but breaking into the absolute cutting edge of AI is takes something special.

It's like what does it take? Like llama was so successful. Quinn is obviously successful. Deepseek is. I don't think you can just brute force that to existing but Nvidia is close to that. They also find the business model behind it because they make it the way that people will use their hardware and software later. So that make total sense for Nvidia. There's no other players who would have this opportunity like them.

Yeah. I did an interview with one of the VPs who leads there's three VPs that lead the Neimatron effort which is their open models. That's what I asked him like why you do this and he was like because we're at the frontier of language modeling research and Nvidia is going to sell more GPUs. I was like damn straight. I was like at least they have a like they have a much clearer business model than anybody else does or has in open models. So for that reason I'm optimistic in the longevity of it.

What is the real what is the shift happening in in open ecosystem currently with all these new models in terms of research? What is the most interesting things for you happening?

People are trying to figure out the right ways to make models and recipes that are extremely compelling like tool use Agents. And the sense is that the closed labs, the frontier models have invested so much in these so-called training environments where they can do this post- training in so many different domains. Some of them probably useless, many of them fruitful where the open ecosystem is in the early days of creating ecosystem like systems for training there.

I think prime intellects verifiers is one. There are others out there where it's just so early and unclear on the open quote unquote like way to apply all of these into pretty comprehensive training runs in academia. I'm trying to figure out what research means for open models and open specialized models to kind of be more of cooperative in a coding Agent setup in 6 to 9 months.

So, if you're running a lab that can train some smaller models and hill climb on tasks, but isn't going to compete with the likes of Claude or Quen or these much more established quote unquote rich agencies, it's like there's clearly going to be a multi-agent future where multiple models can be used.

The absolute top end of academic work could still tap into things that are actually used, but the ecosystem doesn't really operate in that way where if I'm using cloud, I don't offload to a local model that can read all my files or something like this. like there's a future there that is coming and there's a lot of open- source coding Agents like open code and stuff like this where they are trying to figure out how to make coding Agents around open models.

So in terms of the area where there will probably be the most dynamism and excitement, I would I still think it's going to be this kind of tool use post-training. Hopefully the generalized across many environments is exciting because it's the path towards what they're doing at the frontier, but there's the chance that the frontier kind of becomes even more separated from academic and small scale open models this year.

Compute spend that's forecasted to go up and up for all these frontier labs with more compute coming online. Is the post-training the hardest part to make open?

They all have different challenges. I think pre-training data is the hardest legal part to get open because you want to get every corpus on the possible internet and and humans knowledge. Obviously, some of those have historically been fairly latigious if you put them out openly.

I think postraining tends to be fairly complex at the frontier with a lot of models and a lot of sequencing, hard decision-making process of what do you put together into the final recipe. there was a big increase in the complexity of infrastructure for post-training with this RLVR and scaling reinforce learning revolution.

At the end of the day, a lot of it seems like pre-training where the open source infrastructure for pre-training with things like Megatron LM from Nvidia are actually very strong. So some of these labs that are raising hundreds of millions of dollars to train models, they just take this open NVIDIA software and they make it work.

I think reinforce learning is in the era of many many libraries but over the years it'll distill down to a few libraries that actually work fairly well. But in the meantime like the complexity to taking a library and doing it very well like training very well at the cutting edge of post training is pretty hard. Post-raining data is potentially further behind but it's not like there's that much open data sets anywhere in the spectrum of training. I think that highquality release data is just super super rare these days.

Do you work on this in all institute of AI for AI?

Yeah, we touch on all of these things. I think for example, we've released mode 3 in the fall and one of the current things is trying to transition this to be more agentic and more tool use. And what the workflow looks like is that you find the existing open data sets, you look at the evaluations you want to improve on, and you try them.

And I expect that for some evaluations, you can look at like artificial analysis and go through the list. Some of them we will find open data that makes it fairly easy to hill climb on them. And other things that might be super popular to close labs, we would potentially have to buy data for like order of millions of dollars. And I just expect there to be gaps because so few people release the data.

It would seem oddly convenient if academics happened to make all the data sets we need to try to do the things that the frontier models are doing when you know the frontier model playbook is to buy a lot of this data at least to get the flywheel going, right? So I think there's natural barriers, but the same thing why open models end up kind of having a natural lag of closed models.

It's like once the closed models are good at it, it's a lot easier to create training data with those models and put human effort into the gaps. That's kind of the ever evolving dance where people like to overhype open models in my opinion as like oh they're going to cross closed models and I just think the equilibrium is going to continue where the best open models are some 6 to 9 months behind the best closed models and that's fine. That's a pretty short timeline with how good the best closed models are now. But I don't see the dynamic changing. If anything it might air slightly unt on the side of the closed models being more ahead.

But even 9 months of a gap is crazy. We don't think they will catch up.

There's no reason to think that the open models will like they have fewer resources and resources normally determine the outcome. It's like resources and talent determine the outcome. Arguably in Chinese labs, the talent is proportionally similar to the likes of OpenAI and Enthropic, but the resources in terms of compute and ability to buy data is just so much lower.

At the end of the day, it's mostly compute that people need to make improvements to the model because compute is spent either on training or generating large amounts of synthetic data. I think for example in mode 3, this isn't super clearly documented, but we spent millions of dollars on synthetic data. A lot of it was through a grant from the federal government for this frontier supercomputer in the US to generate synthetic data.

But like if even AI2 is spending billions of dollars on effective compute for synthetic data and training, you're going to guess that the compute expenditure there is almost like billions at these frontier labs. Like these just are huge costs constantly that the western companies are way more capitalized to do and these are closed labs right now.

Kind of not infinite. If they behind only 6 8 months and if the usage is already possible by like businesses and individual people we can imagine the situation in 6 months that you will be able to use open model just for your daily life as you use I don't know like chat GPT or cloud don't you think so?

It's a bit of a bet I think on the chatbot side open models will definitely be there I think there's some robustness and some tool use stuff where I haven't seen an open model that's quite as good as search as like GPD 5.2 too. But in this chat interface, I think it will be there. The coding Agents potentially not, but it comes down to a bet.

And I would say that I bet on the side that the closed models are so early in their interest in these coding Agents. A lot of the way this happens is that the researchers become interested in a domain in a way of using the models and then they take on improving them.

If cloud code and the likes came out last April and adoption is not front led by the training teams. I've talked to people at these companies and they're like, "Wow, yeah, cloud code with Opus 4.5 is when I finally started using coding Agents a couple months ago." And if they're like still just getting obsessed with them, then they will decide, oh, we need to train the model that's really really good at this. So, there's still this catch-up time where they they're going to start turning that crank and I think the coding Agents will get much better.

It comes down to a bet. You could say, "Yes, these costs are the upside is not going to be proportionate to the cost and open models will catch up." It just seems like there is no clear indication that the models will actually hit a wall. It seems like everybody I talked to is like there's a lot of low hanging for it's complex technical work in terms of research and execution and cost and we need to keep turning the cranks.

I think obviously there's the macroeconomic sense where there's there's a time gating on the companies to show the value of it. But it seems like this Cloud Code moment is going to it at least buys them a lot more time to spend.

What lowanging fruits do you see?

It's literally like anywhere. A lot of it becomes like there different flavors of we put this data set in and it helped a lot. How do we make it 10x bigger? Or we put this data set in, it helped a lot and we didn't really filter it very well yet. So let's filter it more. Or our code for training only uses 60% of the GPU utilization. So we can make it 10% faster by writing some better kernels and therefore all of our experiments are 10% faster. And if you do that 40 times, you end up with like a 4x faster codebase and all of your experiments are way faster and you can do more complicated ideas and things.

So it's pretty much all of these things from the most established which is we have our pre-training data set that we've been filtering and iterating on for years and they're still tweaking it in order to better serve the current tasks of interest in order to be more efficient to these cutting edge things of we just spent $40 million on three training environments for coding Agents because we have all these new users and codecs and we just plugged it in on our first run and some numbers go up. How do we pick the next thing to do there? or complicated things like we try to just make the model bigger and the numerical issues are too hard like how do we come up with a new RL algorithm that handles this numeric better.

So I just think that there's constantly these little problems of what is better suited to the situation at hand. It's like a lot of the open models are switching to these hybrid architectures which is a mix of this linear attention and like the traditional attention. And part of why I think that's the case is they're just more complex numerically, but also the upside on downstream RL and inference is so high where you can save so much on RL where it's just kind of like the industry collectively pushing through the next harder thing to do.

It seems like Quen 3 has a linear model, Kimmy has a linear model, like we're working on a linear model. People found it scraping GitHub. Why does that happen?

It's like it's a collective readiness of infrastructure and ideas being tested at smaller scales so that they work. And I asked the lead of the project in AI2 is like do you think all the models will be hybrid in the future and why did this not happen two years ago when the hype for Mamba was really high and it's just like a few settings on Mamba needed to be figured out where the Mamba models did really well on pre-training benchmarks but the actual text generation from them wasn't nice and there's just a few things in the architecture that needed to be changed. balancing them better with traditional transformer style architectures and from that reason and a bit more tinkering like the models are a lot more stable and people are trying to use them.

That's an example of like over 2 and 1/2 years where it's like I remember when the hype for Mamba and stayace models was so high but now it seems to be like really hitting a lot of places where like Nvidia had a model with a hybrid attention as well and it's just like how do you predict that timeline? I don't know. But I think there's definitely plenty of things like that that are going to continue to evolve from AI research to reality is the big question is like what does Sam Alman do on the fundraising engine.

I don't think that's going to be like a bubble popping thing but there could be corrections on Nvidia stock or stuff in this time that I don't know people are rumoring open AAI and anthropic IPOs by the end of the year. I wouldn't be that surprised.

Yeah. What do you think about SpaceX and their excitement?

Musk Industries, for better or worse, has a bit of villain vibes, but in a sci-fi movie, and I'm not I'm not enough of a businessman to comment on what is actually happening, I would guess that there's a reason other than just greed and recouping losses from X or XI. for whatever opinion on Elon you have, especially on the political side that I think he should spend less time in, he has such a track record in building businesses that I have to think that there is a plan there.

And I think that the like recent Tesla comments of stopping the Model S and the Model X is fairly shocking to me. I think particularly because the timeline the timeline on robotics seems too soon and my intuition and my intuitions are that like scaling robotics now is like a bit too soon but when there are that serious of bets done by it by somebody like Elon it's like there has to be reasons and things that I don't know as a fairly ignorant person about SpaceX and Tesla and large scale robotic manufacturing like I'm not informed here so we'll see we'll see you can add couple couple of years to what Elon Musk projects but eventually he will reach the goal. That's the track record.

Yeah. Everything you described from this research point is mostly with transformers and with like occasional hybrid models. Do you see a serious research going into some other areas?

For whatever reason, there's this kind of continual learning hype point. I'm kind of of the bucket that some sort of like in the coding Agents we have these cloud MD files and Agents MD and they're actually fairly good at learning from them and in the short term that'll do a lot but obviously there's such a big gain in something. I think continual learning is best thought of as an example for a research problem that will eventually be solved and is wonderfully motivated but you'll never know when a real solution hits the atcale solutions.

And the idea is that the model weight should change based on experience personal to you or the model in the world. And that is so reasonable when training is so expensive and it can seem like the models are so dumb in some situations as a motivation. It it just kind of has to be true. I think there will the likelihood that we're on something that looks like a transformer in 20 years. I don't know. It's not that high. We have the hybrid model thing a couple more transformations down the line. Like are people going to still call it a transformer? I don't know. Seems like attention is pretty good. It's just so hard to predict the research evolution.

I'm I'm asking this question because in Lex Friedman podcast you said if you were starting you probably would not do transformers now is this for an academic?

Yeah, the best research is further out. I think of deep learning and transformers as kind of fundamental skills that you have as a CS researcher where it's like when I was starting it's just like okay, you just need to learn about how deep deep learning back prop libraries work and now people need to learn about how transformers work. But if you're doing work that is so blatantly just improve on what we already have, it's going to be kind of hard to market it academically, unless you're at the absolute top end of academia where they have more resources and industry connections to kind of really stay at this frontier.

So like some of the labs at UDub and Stanford and Berkeley do this, but most of academia is not like this. So you need something that is less busy or takes longer. I think like another thing that could be described as is like find space in between two popular ways of thinking or two sub fields or like just go where there are not people because there are some like things are by their nature are very likely to work in AI. The amount of things that just kind of work is really really high if you set the optimization upright.

I don't know if I love my own advice because it's easy to say like go off into somewhere but it's also hard to like make it work like don't do transformers. It's like okay like what do like what do I do? I don't know.

I mean you you did uh reinforcement learning when it was kind of a dump.

Yeah. I think we also talked about robotics and like working in robotics I've seen many people be so happy. It's a bit more grounded but you accumulate the benefits that is happening in this language model revolution while kind of being a bit more secondhand to it. So you're probably not subject to the whiplash but you're getting the upside of new types of things working and the ability to try new things in a specific domain.

If we talk more about practical implementation of open models first of all if you don't mind we start from kind of defining what open model open source AI is and I don't even know if it matters do you think definition matter?

The definition doesn't matter in the classical sense of the debate of like assigning things to it matters in what like the community definition is and the community definition of open source is really this like weights are released available openly so that anyone can use them. And I think that chimes into the license discussion where it's one of the better things these Chinese labs have done is they're all just releasing their models with extremely permissive licenses where a lot of US companies have done things where it's like non-commercial or extra terms and conditions attached.

And in enterprise situations, lawyers do not like vague extra terms attached. And a lot of people that end up writing these custom AI licenses do it in ways that there's vague language and exposure of risk. So just having simple terms where it's clear that people can use them is great. And in the past there's been a lot of debate on like open source you need the code and the data available which is like yes these are valuable resources and yes that fits more with the open source software motivations of like things are free and reproducible and modifiable and so on but it's just like the debate is kind of died down when people actually just use openw weight models. It's like this is the thing and this is the paradigm by which people are operating in something that's more open with AI.

So I generally am like, "Oh, it's fine." I'm happy to not be in stupid debates on like, "Oh, is this open source or not?" It's just like that was not fun. And still, it's still almost fully transparent, fully open source. There's a group. It's like AI2, Percy Lang, Marin Thing, Hugging Phase, LLM 360, which is also affiliated with the like MZUBA. It's like the UAA thing. Yeah. The Swiss had a project that was very open. So there are actually more people especially in the like so-called western ecosystem that are doing this fully open thing which I think is a very good appreciation for scientific standards and like scientific progress but it's still niche.

It's not the fact that these are fully open is not causing dramatically more uptake. It would be nice if some company was like, "Oh, these are fully open, so I can iterate faster and so on." But just using a better model is way more benefit than having the model be fully open in terms of real world applications.

My bet is that 2026 will be the year when people would start and like enterprises also will start open models will start using open models much more. How do you see it?

I think they are I think it's been ongoing. A lot of companies default position is they want to use open models for information security, more predictable costs, owning their stack and it takes a long time for companies to move into these new types of technology. I suspect it'll look like all the same things where there's all these surveys of like companies are they getting benefit out of using AI tools for co-working and it'll be noisy for a very long time but then it'll be seen that a lot of companies are using open models and fine-tuning them for their use cases in ways that creates a lot of value. It's just going to take a long time, especially if any training is involved where you just can't take a model off the shelf because it's enough to like thoroughly test a model. To have to actually do the training yourselves is months long of a commitment and you need to have a team of dedicated people.

I started to learn about economics of uh open source AI like if you a company and if you want to use open model, it's like sometimes prohibitively expensive. So it's like free open- source model not cheap.

Yeah. Cuz you have to commit to a certain amount of compute to serve a model at a meaningful way and like you could very easily buy the minimum compute to host said model and get nowhere near the inference load to frankly like support that spin. That's why some of the like disagregated inference companies make sense where it's just like they will get enough load to make certain features worthwhile and stuff like this that it just takes time to build that up.

How is it organized in AI2? Do you guys use open models in the institute?

Some people do but not a lot lot. I think the like core thing that AI2 needs to get to for the models to take the next step is to like actually you have to dog feed the models and build development loops between using the model and improving the model and getting real feedback cuz until then it's somewhat of a toy project where there are users and there are things that is like deployed into but if you're not touching it like why do you expect other people to do so with serious intent?

One thing at a time do you use actively open models so you close and APIs?

I mostly use closed bottles. They I try them especially around launches. They're not good enough or cheap enough where it's obvious that I want to switch my habits. And a lot of this is classic product patterns in tech where users have strong habits. And I see the like slow evolution back and forth between say like chatbt and claude for certain uses. And it takes a long time for those habits to shift. Even though like I think Claude's language on normal conversation is a much much more succinct and palatable than a lot of the GPT5 things but like Claude models have been like that for a while. It's just taken me months to like reshift the habit and this evolution has happened multiple times in AI's recent history where I go from and it's just like I go to an open model and it's just like fine. It's like why would I in a product sense why would I use this?

I think like Florian who works with me at interconnects to study open models has a few things he uses them religiously for cuz it's like only the open models we do this which is like he uses one of Nvidia's parakeet models which is speech to text and then as a much faster way to input comprehensively into like cloud code. So he will speak to his AI and then parakeet will transcribe them and then Quen form B will rewrite it and pass it to Claude. That's a very real use case that he swears by and gets a lot of value out of. But I'm like, h I haven't done it yet. I haven't I haven't used these substantially.

Yeah, it's surprisingly how hard it is to change a habit with a model. I still struggle with 5.2. I think they changed something. I don't know what they changed. No matter what character I choose, it sucks. I still use it for thinking. I still like deep research part of it, but when I read what it tells me, it just and it doesn't follow the rules. I don't know. I don't know how to make them follow the rules.

It's pretty funny cuz it's like I feel like everybody that's deep in it kind of has these opinions just like what is this problem? And that's what I mean by the lowhanging fruit. It's like people just need to fix that. That's actually fixable. It just takes a lot of work. Training like to make the model actually remember the constant prompts like the same prompt that I put in the memory.

Yeah, it's just like changing data and creating new data. The process for getting that type of potentially niche problem to be represented when you're solving thousands of use cases at once is just like it takes a lot of balancing. You probably need to build an evaluation for it so that you can kind of automatically measure it and then you have one more evaluation. They probably have hundreds of vows. There's just a lot of moving pieces to get right. So I think they're fairly conservative in what things are really the priority and they probably are willing to take regressions and some things like memory management to push the frontier in utility.

If you're going to take a gigantic step in the things you were targeting and you have some second order effects that you're going to revisit later, you're probably going to do it right. The focus now is on coding and in coding you don't really need this beautiful

Others You May Like