Lex Fridman
January 31, 2026

State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490

State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490

Authors: Lex Fridman, Sebastian Rashka, Nathan Lambert

Date: October 2023

This summary reveals core technical and economic forces shaping LLMs. It's for builders and investors seeking real value and navigation in global AI competition.


This episode answers: Who leads the global AI race, and why are China's open-weight models so influential? How are scaling laws evolving, and what does Reinforcement Learning with Verifiable Rewards (RLVR) mean for AI capabilities? What are the hidden costs and human implications of AI's rapid advancement, from burnout to the future of work?

Lex Fridman hosts Sebastian Rashka and Nathan Lambert, dissecting the AI arena. They unpack relentless innovation, global compute competition, and architectural shifts, diving into AI's technical and strategic implications.

Top 3 Ideas

🏗️ The Compute Arms Race:

"No company will have exclusive tech access. Budget and hardware are the true differentiators."
  • Talent & Hardware: AI ideas flow freely, making proprietary tech fleeting. Massive compute and budget differentiate players, favoring well-funded giants like Google's TPUs.
  • Open Model Influence: Chinese companies like DeepSeek win "hearts" with open-weight models, building influence where users resist proprietary APIs.

🏗️ Scaling Laws Redefined:

"Scaling laws show a predictable power-law relationship between compute/data and prediction accuracy, defining model improvement."
  • Beyond Pre-training: Scaling laws extend to post-training (RLVR) and inference. Models get smarter from "thinking" time and self-correction, not just data.
  • RLVR's Magic: Reinforcement Learning with Verifiable Rewards (RLVR) trains models to solve problems via trial, feedback, and self-correction. It teaches how to reason.
  • Cost vs. Capability: Pre-training is expensive. Inference scaling and RLVR offer attractive performance boosts. Investment shifts to optimizing how models use knowledge.

🏗️ The Jagged Future of AI:

"The dream of a single, all-encompassing AI is dying. Specialized models suggest a future of diverse, targeted AI, not one general intelligence."
  • Specialization over Generalization: The "one model to rule all" vision fades. AI excels in specific domains (coding, math), struggling with general computer use.
  • Human-AI Collaboration: Superhuman coders emerge, but humans remain vital for system design, goal setting, and filtering AI-generated content. New roles guide AI output.
  • The Value of Struggle: Over-reliance on AI risks hindering human expertise. Use AI for mundane tasks, but preserve human "struggle" for deep learning and problem-solving.

Actionable Takeaways

  • 🌐 The Macro Shift: Global AI pivots from raw model size to sophisticated post-training and efficient inference. China's open-weight models force a US strategy re-evaluation.
  • ⚡ The Tactical Edge: Invest in infrastructure and talent for RLVR and inference-time scaling. These frontiers enable new model capabilities and economic value.
  • 🎯 The Bottom Line: AI's relentless progress amplifies human capabilities. Focus on systems augmenting human expertise and navigating ethical complexities. Real value lies in intelligent collaboration.

Podcast Link: Click here to listen

The following is a conversation all about the state-of-the-art in artificial intelligence, including some of the exciting technical breakthroughs and developments in AI that happened over the past year and some of the interesting things we think might happen this upcoming year.

At times it does get super technical, but we do try to make sure that it remains accessible to folks outside the field without ever dumbing it down. It is a great honor and pleasure to be able to do this kind of episode with two of my favorite people in the AI community, Sebastian Rashka and Nathan Lambert.

They are both widely respected machine learning researchers and engineers who also happen to be great communicators, educators, writers, and Twitterers exposters. Sebastian is the author of two books I highly recommend for beginners and experts alike. First is build a large language model from scratch and build a reasoning model from scratch.

I truly believe in the machine learning computer science world the best way to learn and understand something is to build it yourself from scratch. Nathan is the post-training lead at the Allen Institute for AI and author of the definitive book on reinforcement learning from human feedback.

Both of them have great exac accounts, great substacks, Sebastian has courses on YouTube, Nathan has a podcast, and everyone should absolutely follow all of those. This is the Lex Freedman podcast. to support it. Please check out our sponsors in the description where you can also find links to contact me, ask questions, get feedback, and so on.

And now, dear friends, here's Sebastian Rashka and Nathan Lambert.

So, I think one useful lens to look at all of this through is the Deep Seek, so-called Deepseek moment. This happened about a year ago in January 2025 when the openweight Chinese company DeepSeek released Deepseek R1 that I think it's fair to say surprised everyone with near or at state-of-the-art performance with allegedly much less compute for much cheaper and from then to today the AI competition has gotten insane both on the research level and the product level it's just been accelerating.

Let's discuss all this today and maybe let's start with some spicy questions if we can. Who's winning at the international level? Would you say it's the set of companies in China or the set of companies in the United States? And Sebastian, Nathan, it's good to see you guys.

So Sebastian, who do you think is winning?

So winning is a very broad term. I would say you mentioned the deepseek moment and I do think deepseek is definitely winning the hearts of the people who work on open weight models because they share these as open models.

Winning I think has multiple time scales to it. We have today we have next year we have in 10 years. One thing I know for sure is that I don't think nowadays 2026 that there will be any company who is let's say having access to a technology that no other company has access to.

And that is mainly because researchers are frequently changing jobs, changing labs, they rotate. So I don't think there will be a clear winner in terms of technology access. However, I do think there will be the differentiating factor will be budget and hardware constraint.

So I don't think the ideas will be proprietary but the way or the resources that are needed to implement them and so I don't see currently take it all scenario where a winner takes it all I can't see that at the moment.

Nathan, what do you think?

You see the labs put different energy into what they're trying to do and I think to demarcate the point in time when we're recording this. The hype over Anthropics Cloud Opus 4.5 model has been absolutely insane which is just I mean I've used it and built stuff in the last few weeks and it's it's almost gotten to the point where it feels like a bit of a meme in terms of the hype.

And it's kind of funny because this is very organic and then if we go back a few months ago, we can get the release date in the notes as Gemini 3 from Google got released and it seemed like the marketing and just like wow factor of that release was super high.

But then at the end of November, Claude Opus 4.5 was released and the hype has been growing. But Gemini 3 was before this. And it kind of feels like people don't really talk about it as much. Even though when it came out, everybody was like, "This is Gemini's moment to retake kind of Google's structural advantages in AI."

And Gemini 3 is a fantastic model and I still use it. It's just kind of differentiation is lower. And I agree with Sebastian what you're saying with all these like the idea space is very fluid but culturally anthropic is known for betting very hard on code which is cloud code thing is working out for them right now.

So I think that even if the ideas flow pretty freely so much of this is bottlenecked by human effort and kind of culture of organizations where anthropics seems to at least be presenting as the least chaotic. is is a bit of an advantage and if they can keep doing that for a while.

But on the other side of things, there's a lot of ominous technology from China where there's way many more labs than Deep Seek. So Deep Seek kicked off a movement within China. I say kind of similar to how Chad GBT kicked off a movement in the US where everything had a chatbot.

There's now tons of tech companies in China that are releasing very strong frontier openweight models to the point where I would say that Deep Seek is kind of losing its crown as the preeminent open model maker in China.

And the likes of Z.AI with their GLM models, Miniax's models, Kimmy Moonshot, especially in the last few months, have shown more brightly. The new Deep Seek models are still very strong, but that's kind of a it could look back as a big narrative point where in 2025 Deep Seek came and then all and it kind of provided this platform for way more Chinese companies that are releasing these fantastic models to kind of have this new type of operation.

So these models from these Chinese companies are open weights and depending on the trajectory of business models that these American companies are doing could be at risk. But currently a lot of people are paying for AI software in the US and historically in China and other parts of the world people don't pay a lot for software.

So some of these models like deepseek have the love of the people because they are open weight.

How long do you think the Chinese companies keep releasing open weight models?

I would say for a few years I think that like in the US there's not a clear business model for it. I have been writing about open models for a while and these Chinese companies have realized it. So I get inbound from some of them and they're smart and realize the same constraints which is that a lot of US tech companies and other IT companies won't pay for a API subscription to Chinese companies for security concerns.

This has been a long-standing habit in tech and the people at these companies then see openweight models as an ability to influence and take part of a huge growing AI expenditure market in the US. and they're very realistic about this and it's working for them and I think that the government will see that that is building a lot of influence internationally in terms of uptake of the technology.

So there's going to be a lot of incentives to keep it going but building these models and doing the research is very expensive. So at some point I expect consolidation but I don't expect that to be a story of 2026 where there will be more open model builders throughout 2026 than there were in 2025 and a lot of the notable ones will be in China.

You were going to say something.

Yes. You mentioned Deep Seek losing its crown. I do think to some extent yes, but we also have to consider though they are still I would say slightly ahead and the other ones it's not that deep got worse. It's just like the other ones are using the ideas from Zepseek.

For example, you mentioned Kimmy, same architecture. They're training it. And then again, we have this leaprogging where they might be at some point in time a bit better because they have the more recent model. And I think this comes back to the the fact that there won't be a clear winner.

It's it will just be like like that and one person releases something, the other one comes in. And the the recent the most recent model is probably always the best model.

We'll also see the Chinese companies have different incentives. So like DeepSeek is very secretive where some of these startups are like the Minia Maxes and Z.AI of the world. Those two literally have filed IPO paperwork and they're trying to get Western Mind share and do a lot of outreach there.

So I don't know if these incentives will kind of change the model development cuz Deep Seek famously is built by a hedge fund highf flyier capital and we don't know exactly what they like. We don't know what they use the models for or if they care about this. They're secret in terms of communication. and they're not secret in terms of the technical reports that describe how their models work. They're still open on that front.

And we should also say on the Opus 45 hype, there's the layer of something being the darling of the X echo chamber on Twitter echo chamber and the actual amount of people that are using the model. I think it's probably fair to say that Chbt and Gemini are focused on the broad user base that just want to solve problems in their daily lives and that user base is gigantic.

So the hype about the coding may not be represented the actual use. I would say also a lot of the usage patterns are like you said name recognition brand and stuff but also muscle memory almost where you know like chipd has been around for a long time people just got used to using it and it's kind of like almost like a flywheel they recommend it to other users and that stuff

One interesting point is also the customization of L&Ms for example chip has a memory feature right and so you may have a subscription and you use it for personal stuff but I don't know if you want to use that same thing at work, you know, because that's a boundary between private and work. If you're working at a company, they might not allow that or you may not want that.

And I think that's also an interesting point where you might have multiple subscriptions. One one is just clean code. It keeps has nothing of your personal images that you or hobby projects in there. It's just like the work thing and then the other one is your personal thing.

So I think that's also something where two different use cases and it doesn't mean you only have to have one. It's it's I think the future is also multiple ones.

What model do you think won 2025 and what model do you think is going to win 26?

I think in the context of a consumer chat bots is a question of are you willing to bet on Gemini over Tatypt which I would say in my gut feels like a bit of a risky bet because open AI has been the incumbent and there's so many benefits to that in tech but I think the momentum if you look at 2025 was on Gemini's side but they were starting from such a low point I think on RIP Bard and these earlier attempts of of getting started I Huge credit for them for powering through the organizational chaos to make that happen.

But also, it's hard to bet against OpenAI because they always come off as so chaotic, but they're very good at landing things. And I think like personally, I have very mixed reviews of GPT5, but it had to have saved them so much money with the hideline feature being a router where most users are no longer charging like charging their GPU costs as much.

So I think it's very hard to dissociate the things that I like out of models versus the things that are going to actually be a general public differentiator.

What do you think about 2026? Who's going to win?

I'll say something even though it's risky. I will say that I think Gemini will continue to take progress on Chad GPT. I think Google scale when both of these are operating at such extreme scales and like Google has the ability to separate that research and product a bit better where you hear so much about open AI being chaotic operationally and chasing the high impact thing which is a very startup culture and then on the software and enterprise side I think anthropic will have continued to success as they've again and again been set up for that and obviously Google's cloud has a lot of offerings but I think this kind of like Gemini name brand is important for them to build and and Google's cloud will continue to do well, but that's kind of a more complex thing to explain in the ecosystem because that's competing with the likes of Azure and AWS rather than on the model provider side.

So infrastructure you think TPUs give an advantage largely because the margin on Nvidia chips is insane and Google can develop everything from top to bottom to fit their stack and not have to pay this margin and they've had a head start in building data centers.

So all of these things that have both high lead times and very high margins on high costs, Google has a just kind of a historical advantage there. And if there's going to be a new paradigm, it's most likely to come from OpenAI where they're kind of their research division again and again has kind of shown this ability to land a new research idea or a product.

I think like deep research, Sora, 01 thinking models like all of these definitional things have come from OpenAI and that's got to be one of their top traits as an organization. So it's kind of hard to bet against that. But I think a lot of this year will be about scale and optimizing what could be described as lowhanging fruit in models.

And clearly there's a trade-off between intelligence and speed. This was what Chad GPT5 was trying to solve behind the scenes. It's like do people actually want intelligence the broad public or do they want speed?

I think it's a nice variety actually or the option to have a toggle there. I mean first for my personal usage most of the time when I look something up I use JGPD to ask a quick question get the information I want it fast for you know most daily tasks I use the quick model nowadays I think the auto mode is pretty good where you don't have to specifically say thinking or you know non-thinking and stuff then again I also sometimes want the pro mode very often what I do is when I have something written I put it into JBD and say hey do a very thorough check is are all my references correct are all my thoughts it's correct.

Did I make any formatting mistakes? And are the figure numbers wrong or something like that? And I don't need that right away. It's something, okay, I finish my stuff, maybe have dinner, let it run, come back and go through this. And I think, see, this is where I think it's important to have this option. I would go crazy if for each query I would have to wait 30 minutes or 10 minutes. That's me.

I'm like saying over here losing my mind that you use the router and the non-thinking model. I'm like, "Oh, how do you how do you live with how do you live with that?" It's like my reaction. I'm been heavily on Chad BT for a while. Never touched five non-thinking. I find its tone and then it's propensity of errors. It's just like has a higher likelihood of errors.

Some of this is from back when openi released 03 which was the first model to do this deep search and find many sources and integrate them for you. So, I became habituated with that. So, I will only use GPT 5.2 to thinking or pro when I'm finding any sort of information query for work, whether that's a paper or some code reference that I found and it's just like I I will regularly have like five pro queries going simultaneously each looking for one specific paper or feedback on an equation or something.

I have a funny example where I just needed to answer as fast as possible for this podcast before I was going on the trip. I have like a local GPU running at home. And I wanted to run a long RL experiment. And usually I also unplug things because you never know if you're not at home, you don't want to have things plugged in.

And I accidentally unplugged the GP. It was like my wife was already in the car and it's like, "Oh, dang." And then basically I wanted as fast as possible a bash script that runs my different uh experiments in the evaluation. And I did something I know. I learned how to use the bash uh interface or bash terminal but in that moment I just needed like 10 seconds give me the command.

This is a hilarious situation but yeah so what did you use?

So I did the non-thinking fastest model. It gave me the bash command I to chain different uh scripts to each other and then the thing is like you have the T thing where you want to route this to a lock file. Top of my head I was just like in a hurry. I could have thought about it myself.

By the way, I don't know if there's a representative case wife waiting in the car. You have to run, you know, plug the GPU. You have to generate a bash script. It sounds like a movie like Mission Impossible.

I use Gemini for that. So I use thinking for all the information stuff and then Gemini for fast things or stuff that I could sometimes Google which is like it's good at explaining things and I trust that it has this kind of background of knowledge and it's simple and the Gemini app has got a lot better and it's good for that sort of things and then for code and any sort of philosophical discussion I use claude opus 4.5 also always with extended thinking extended thinking and inference time scaling is just a way to make the models um marginally smarter and I will always edge on that side when the progress is very high because you don't know when that'll unlock a new use case and then sometimes use Grock for um real-time information or finding something on AI Twitter that I knew I saw and I need to dig up and I just fixated on although when Grock 4 came out the Gro 4 what is super heavy which was like their pro variant was actually very good and I was pretty impressed with it and I just kind of like muscle memory lost track of it with having the chatbt app open so I use many different things.

Yeah, I actually do use Gro 4 heavy for debugging for like hardcore debugging that the other ones can't solve. I find that it's the best at and I it's interesting because you say JPT is the best interface uh for me for that same reason, but this could be just Momentum.

Gemini is the better interface for me. I think because I fell in love with their best needle in the haststack. If I ever put something that has a lot of context, but I'm looking for very specific kinds of information, make sure it tracks all of it. I find at least uh the Gemini for me has been uh the best.

So, it's funny with some of these models, if they win your heart over for one particular feature at one on a one particular day, for that particular query, that prompt, you're like, "This model is better." And so, you'll just stick with it for a bit until it does something really dumb. there's like a threshold effect, some smart thing and then you fall in love with it and then it does some dumb thing and you're like, you know what, I'm going to switch and try claw and try GPT and all that kind of stuff.

This is exactly like you use it until it breaks until you have a problem and then then you change uh the LM and I think it's the same how we use anything like our favorite text editor um operating systems or the browser. I mean there are so many browser options Safari, Firefox, Chrome, all the relatively similar but then there are edge cases maybe extensions you want to use and then you switch but I don't think there is any one who types the same thing like the website into different browsers and compares them.

You only do that when the website doesn't render if something breaks I think. So that's that's a good point. I think you use it until it breaks and then you explore other options. I think on the long context thing I was also a Gemini user for this but the GPT 5.2 to release blog had like crazy long context scores where a lot of people were like did they just figure out some algorithmic change.

It went from like 30% to like 70% or something in this minor model update. So it's also very hard to keep track of all of these things. But now I'm look more favorably at GPT 5.2's long context. So it's just kind of like how do I actually get to testing this never ending battle.

It's interesting that none of us talked about the Chinese models from a user usage perspective. What does that say? Does that mean the Chinese models are not as good or does that mean we're just very biased and us focused?

I do think that that's currently the discrepancy between just the model and the platform. So I think the open models they are more known for the open weights, not their platform yet. There are also a lot of companies that are willing to sell you the open model inference at a very low cost.

I think like open router it's easy to do the look at multimodel things you could run deepseeek on perplexity I think all of us sitting here are like we use openai GPT5 pro consistently we're all willing to pay for the marginal intelligence gain and anyone that's like the these models from the US are better and in terms of the outputs I think that the question is will they stay better for this year and for years going but it's like so long as they're better I'm going to pay for it to use them

I think there's also analysis that shows that like the way that the Chinese models are served this you could argue due to export controls or not is that they use fewer GPUs for replica which makes them slower and have different errors and it's like the speed and intelligence if these things are in your favor as a user.

I think in the US a lot of users will go for this and I think that that is one thing that will spur these Chinese companies to want to compete in other ways whether it's like free or substantially lower costs or it'll breed creativity in terms of offerings which is good for the ecosystem but I just think the simple thing is US models are currently better and we use them and I try Chinese I try these other open models and I'm like fun but not going to I don't go back to it.

We didn't really mention programming. That's another use case that a lot of people deeply care about. So, I use basically half and half cursor and claw code because there I find them to be like fundamentally different experience and both useful.

What do you guys you program quite a bit. So, what what do you use? What's the current vibe?

So, I use the codeex plugin for VS Code. You know, it's very convenient. It's just like a plugin and then it's a chat interface that has access to your repository. I know that cloud code is I think a bit different. It is a bit more agent. It touches more things. It does a whole project for you.

I'm not quite there yet where I'm comfortable with that because maybe I'm a control freak, but I still would like to see a bit what's going on. And codeex is kind of like right now for me like the sweet spot where it is helping me, but it is not taking completely over.

I should mention one of the reasons I do use claude code is to build the skill of programming with English. I mean the experience is fundamentally different. You're as opposed to micromanaging the details of the process of the generation of the code and uh looking at the diff which you can in cursor if that's the idea you use and and then changing altering looking and reading the code and understanding the code deeply as you progress versus just kind of like thinking in this design space and just guiding it at this uh macro level which I think uh is another way of thinking about the programming process.

Also, we should say that cloud code, it just seems to be somehow a better utilization of cloud opus 45. It's a good side by side for people to do. So, you can have cloud code open, you can have cursor open, you can have VS code open, and you can select the same models on all of them and ask questions. It's very interesting. Like the the cloud code is way better in that domain. It's remarkable.

All right, we should say that both of you are legit on multiple fronts. Researchers, programmers, educators, tweeterers, and on the book front, too. So, Nathan at some point soon hopefully has an RHF book coming out. It's available for pre-order, and there's a full digital preprint. just making it pretty and better organized for the physical thing, which is a lot of why I do it because it's fun to create things that you think are excellent in the physical form when so much of our life is digital.

I should say going to perplexity here, Sebastian Rashka is a machine learning researcher and author known for several influential books. A couple of them that I wanted to mention, which is a book I highly recommend, build a large language model from scratch and the new one, build a reasoning model from scratch. So, I'm really excited about that. Building stuff from scratch is one of the most powerful ways of learning.

Honestly, building an element from scratch is a lot of fun. It's also a lot of to learn. And like you said, it's probably the best way to learn how something really works cuz you can look at figures, but figures can have mistakes. You can look at concepts, explanations, but you might misunderstand them.

But if you see the there is code and the code works, you know it's correct. I mean, there's no misunderstanding. It's like it's precise otherwise it wouldn't work. And I think that's like kind of like the beauty behind coding. It is kind of like it doesn't lie. It's math basically.

So even though with math, I think you can have mistakes in a book. You would never notice because you're not running the math when you're reading the book, you can't verify this. And with code, what's what's nice is you can verify it.

Yeah, I agree with you about the LM from scratch book. It's nice to tune out everything else, the internet and so on, and just focus on the book. But, you know, I read uh several like, you know, uh history books. It's just less lonely somehow. It's really more fun.

Like, uh, for example, on the programming front, I think it's genuinely more fun to program with an LLM. And I think it's genuinely more fun to read with an LLM, but you're right, like this distraction should be minimized. So it's uh you use the LLM to basically enrich the experience, maybe add more context. Maybe the I just the rate of aha moments for me in a small scale is really high with LLM.

100%. I would I also want to correct myself. I'm not suggesting not to use LM. I suggest doing it in multiple passes like one pass just offline focus mode and then after that I mean I also take notes but I I try to resist the urge to immediately look things up.

I I do a second pass. It's just like for me more structured this way and I get le I mean sometimes things are answered in the chapter. But sometimes also it just helps to let it sink in and think about it. Other people have different preferences.

I would highly recommend using LLM when reading books. For me it's just it's not the first thing to do. It's like the second pass.

By way of recommendation is to say I do the opposite. I like to use the LLM at the beginning to lay out the full context of like what is this world that I'm now stepping into. But I try to avoid clicking out of the LLM into the world of like Twitter and blogs and because then you're now down this rabbit hole.

You're reading somebody's opinion. there's a flame war about a particular topic and all a sudden you're no longer you're now in the in the realm of the internet and Reddit and so on. But if you're purely letting the LLM give you the context of why this matters, what are the big picture ideas uh but sometimes books themselves are good at doing that but not always.

So this is why I like the chat GPT app because it gives the AI a home in your computer when you are f you can focus on it rather than just being another tab in my mess of internet options and I think claude code and these particular does a good job of making that a joy where it seems very engaging as a product designed to be an interface that your AI will then go out into the world and is something that is very kind of intangible between it and codeex is that it just feels kind of warm and engaging where Codex can often be as good from open AI but it just kind of like feels a little bit rougher on the edges whereas like cloud code is makes it fun to build things particularly from scratch where you just don't like you don't have to care but you trust that it'll make something like obviously this is good for websites and kind of refreshing tooling and stuff like this which I use it for or data analysis so I my my blog we scrape hugging paste we keep the download numbers for every data set and model over time now so we have them and it's like cloud was just like yeah I've made use of that data no problem.

And I was like, that would have taken me days. And I was like, then I have enough situational awareness to be like, okay, these trends obviously make sense and you can check things. But that's just a kind of wonderful interface where you can have an intermediary and not have to do the kind of awful low-level work that you would have to do to maintain different web projects and do this stuff.

All right, so we just talked about a bunch of the closed weight models. Let's talk about the open ones. Uh, so tell me about the landscape of Open LM models. Which are interesting ones which stand out to you and why? We already mentioned Deep Seek. Do you want to see how many we can name off the top of our heads?

Yeah. Yeah. Without looking at notes. Deepseek, Kimmy, Miniaax, Z.A.I., Ant, Lang. Are we just going Chinese? Um, let's throw in Mistral AI, Gemma. Um, yeah, GPTOSS, the open source model by Chet GPT. Actually, Nvidia Neimotron had a or Nvidia had a really cool one, a Neotron 3. Um, there there's a lot of stuff, especially at the end of the year. Quen one may be the one. Oh, yeah. Quen was the name the obvious name that was I was trying to get through the You can get at least 10 Chinese and at least 10 Western.

I think that I mean, OpenAI released their first open model since GPT2. That was when I when I meant talked when I was writing about OpenAI's open model release, they were all like, "Don't forget about GPT2." Which I thought was really funny cuz it's just such a different time. But DP OSS is actually a very strong model and does some things that the other models don't do very well.

And I think that selfishly I'll promote a bunch of like western companies. So both in the US and Europe have these like fully open models. So I work at Allen Institute for AI where we've been building which releases data and code and all of this. And now we have actual competition for people that are trying to release everything so that other people can train these models.

So there's the institute for foundation models or LLM 360 which is like had their K2 models of various types. Apparis is a Swiss research consortium. Hugging face um has small LM which is very popular. Um and NVIDIA's neatron has started releasing data as well. And then Stanford's Marin community project which is kind of making it so there's a pipeline for people to open a GitHub issue and implement a new idea and then have it run in a stable language modeling stack.

So this space that list was way smaller in 2024. So I think it was like just AI2. So that's a great thing for more people to get involved and to understand language models which doesn't really have a like a Chinese company that is has an analog.

While I'm talking, I'll say that the Chinese open language models tend to be much bigger and that gives them this higher peak performance as where a lot of these things that we like a lot whether it was Gemma and Neatron have tended to be smaller models from the US which is which is starting to change from US and Europe. U Mr. large three came out which was a giant model very similar to Deepseek architecture in December and then a startup RCAI and both Neatron have Neatron and Nvidia have teased models of this way bigger than 100 billion parameters like this 400 billion parameter range coming in this like Q1 2026 timeline.

So, I think this kind of balance is set to change this year in terms of what people are using the Chinese versus US open models for, which will be which I'm personally gonna be very excited to watch.

First of all, huge props for being able to name so many of these. Did you actually name Llama? Um, no. I feel like this was not on purpose. RIP Llama. Mhm.

All right. Can you mention what are some interesting models that stand out? So you mentioned Quen 3 is is is obviously a standout.

So I would say the year is almost bookended by both DeepSeek version 3 and R1 and then on the other hand in December Deepseek version 3.2 because what I like about those is they always have an interesting architecture tweak that others don't have. But otherwise if you want to go with um you know like the familiar but really good performance quen 3 and like um Nathan said also GPD OSS.

And I think GPT OSS what's interesting about it is kind of like the first public or like open weight model that was really trained with tool use in mind which I do think is kind of a little bit of a paradigm shift where the ecosystem was not quite ready for it

Others You May Like