Latent Space

December 30, 2025

[State of RL/Reasoning] IMO/IOI Gold, OpenAI o3/GPT-5, and Cursor Composer — Ashvin Nair, Cursor

The Reasoning Race: Why RL is the New Scaling Law by Latent Space

Author: Latent Space

Date: October 2023

Quick Insight: This summary is for builders tracking the transition from data-heavy pre-training to compute-heavy reasoning. Ashvin Nair explains why the future of AI depends on tight loops between product context and reinforcement learning.

💡 Why did solving elite math competitions fail to solve AGI?
💡 How does Cursor use two-hour policy updates to outpace labs?
💡 Why will agents reach a trillion-dollar valuation before robotics?

Ashvin Nair moved from Berkeley robotics to OpenAI’s reasoning team before joining Cursor to build the future of AI coding. He argues that the next leap in intelligence comes from Reinforcement Learning and bringing the entire context of a job into the model’s training distribution.

The RL Pivot "[If you told me that we could have gotten IOI gold then I would have just assumed that we could all just go on vacation.]"

Test-Time Compute: RL allows models to think longer rather than just predicting the next token. This moves the scaling law from pre-training data to inference-time reasoning.
Moving Goalposts: Solving elite math didn't result in AGI because we immediately redefined what intelligence means. We are benchmark maximum but still struggle with basic white-collar automation.
The Gritty Transfer: Robotics researchers are moving to LLMs because they are used to messy data. Their persistence is the secret sauce behind current reasoning breakthroughs.

Product-Model Co-design "[LLM agents are going to be like a trillion dollar market before robotics is maybe even like a $10 billion market.]"

Contextual Integration: Software engineering is the first useful agentic task because the context lives in the IDE. Bringing the test distribution into the training distribution is the only way to achieve reliability.
The Speed Advantage: Cursor updates its policy every two hours. Small teams can out-iterate giants by tightening the loop between user behavior and model weights.

The Governance Gap "[OpenAI has a tendency to ship the orb chart.]"

Organizational Reflection: Model releases often reflect internal team structures rather than scientific milestones. Understanding the internal structure explains why certain features feel fragmented.
Unsolved Governance: The Sam Altman firing proved that we lack a stable structure for managing superintelligence. We are currently relying on erratic boards without a clear democratic alternative.

Actionable Takeaways:

🌐 The Macro Pivot: The transition from internet-scale imitation to environment-scale RL.
⚡ The Tactical Edge: Build products that capture the full context of a professional's workflow to make them RL-ready.
🎯 The Bottom Line: Intelligence is no longer the bottleneck. The winner will be whoever builds the best hard drive for professional context.

Podcast Link: Click here to listen

Okay, we're here at Nurips. We're recording a special lens space coverage of the folks at Nurips and we're here with Ashvin from Cursor. Welcome. Hi. Yeah, thanks for having me.

So I guess the like Ashvin Nair is like a new identity. I didn't even know if I should say that because you only joined Cursor for 3 months. Before that you're at OpenAI, worked in 03, before that Berkeley, PhD in RL, just but focus on robotics.

Robotics. Yeah. Is it weird switching from robotics to language models? Okay. This is kind of interesting because a lot of people have been kind of doing this. I mean OpenAI, yeah, robotics. So actually I actually was at OpenAI in 2017 also working on robotics. Yeah, I was interning like right before my PhD where I worked on robotics there. 2017, is that like Japan and Japan over there? He was famously OpenAI's first intern.

Oh really? Okay, then he might have been before. But yeah, there were like 15 interns. It was a very different company. It was just like robotics, Dota, and like 15 interns that summer all having like pretty exciting individual projects. Yeah, that set of interns, if you look at where they are now, it's kind of cool. Yeah. Um but yeah, anyone from that class that like you would shout out?

There's just like a lot of cool papers that came out. Laro Pinto now is at NYU. The person who leads reasoning at XAI, I forgot his name. Well, he left. Eric, but yeah, I forget his name, but he worked on like Kay Fac and stuff, I think. Vision dude, Greg. Not Greg, but yeah. It was like an exciting time to be there.

I think robotics is a pretty good fit for LLMs because the switch ends up being pretty like, you know, you kind of do similar things. You want to look at a lot of data. It's like kind of hard to get stuff working in the robotics world. I think it kind of builds very gritty people who look at data a lot, that kind of thing. So yeah, for whatever reason, I think that transfers. Yeah, it's happening a lot and I think it makes a lot of sense.

One of my Nurips highlights so far, I had dinner with, it's like a small group dinner with Lex Freeman. Yes. Yesterday and Lex used to be in robotics and he was like my assessment of robotics people. Robotics people are the best to talk to at a Nurips because they're most rounded. He says because they don't have a choice. They work with the real world looking data. And then the most unhinged, the most detached from reality are like the simulation people.

I see. Yeah. Yeah. Yeah. I think I agree. Yeah. I actually did a little bit of both during my PhD. I work in kind of like, you know, just prototype ideas in sim and then get them working on real world robotics. And yeah, I mean probably robotics is where you kind of feel AGI the least, right? Because it's just so far away from working. Now I think over the last year maybe there's been demos that have been super interesting from like physical intelligence and like Sunday and stuff that I'm starting to be like, okay, this kind of stuff.

You'll see Sunday robots themselves? Uh I haven't seen them. No. Apparently, they've been doing demos and I'm pretty keen to see. I've seen the physical intelligence ones live and yeah, it's pretty impressive like just on like a in someone's living room folding laundry and stuff like you can just like toss it in there. Everybody must materialize.

Yeah. Yeah. Yeah. Okay. And and and last thing on robotics and you can kind of pivot to 03. Just Omi has like is like restarting a robotics team. Is that serious?

I actually know very little about it because I was in like a pretty different part of the org. I mean I think it's serious. I think there's a ton of excitement around robotics right now. I'm actually kind of curious what drives it because I don't think I fully understand. Like there's been crazy raises and stuff recently for robotics companies. I guess my own view on it. And so when I left robotics in 2022, I thought I would actually come back to robotics, but I think my view on it now is that it feels like LLM agents are going to be like a trillion dollar market before robotics is maybe even like a $10 billion market.

This is just because, you know, LM agents already create value out in the world. Robotics, it's kind of hard to make the case that like, you know, kind of AI robotics like does anything that useful yet. And then once it does something useful, then you have to make the unit economics work out. And I think that's also quite hard, like reliability. I mean these robots have to be fixed and these kind of things. So I think it's kind of hard. I would say the market is kind of efficient in that the software LM companies are raising tens of billions and then the robotics companies are raising hundreds of millions.

So I think this very recently it's been like singledigit billions. Oh really? Okay. Yeah. So, I think that's the maybe surprising thing to me is that it feels like it's ahead of where it's actually at. Yeah. Like I would say that robotics is in kind of like the GPT1 to GPT2 era right now.

Okay. Uh and I haven't worked on robotics. What task would qualify as like, oh, that's that's the inflection? It's a little bit like you know it when you see it. I thought the Sunday demos were kind of cool. Like maybe it's like starting to get there where and the details matter a lot where it can't be it has to be in like a new scenario, like in one that you haven't seen before and maybe on general. Yeah, exactly. And I think that was kind of what GPT2 was too, right? Is kind of like you start to see hints of like cool generalization, but and I think that's fine. It doesn't have to work out of the box, but at this point especially it still feels like in robotics you're not exactly investing in a technology, probably you're just investing in a team.

Yeah. Yeah. I'm not in the space whatsoever, but like that's kind of my impression of it's actually nice when you're not in there because you're like you know as much as like most basically everyone else. So we just kind of speculate. Exactly. People there's a robotics team at OpenAI. Uh so coming back to language models uh did you join 401 or were you like?

I joined right before TACT in like I think September of 2022. Yeah. So yeah actually yeah I was pretty burnt out from my PhD and I was like okay I'm going to go to this chill research lab and then like chap happens and like you know everything kind of blew up and like a lot of stuff got kind of like refocused but what I guess what so open obviously trad surprised OpenAI what did they tell you they were looking for you to do and then obviously changed but?

I mean so I joined on the codegen team, the Codex. Yeah, Codex. Exactly. Yeah, it was like the team that shipped Codex, but by the time we were working on, by the time I joined, we were kind of more so working on the model, doing tool use and these kind of things. And so like very related to the chat, like we're kind of like a sister team to the team that made Chhatri. Yeah, exactly. So we're just kind of working on making the models like smarter like programming competitions like how to do like SAT for that, that kind of stuff.

The word IMO/IOI gold has felt reachable in that title. Oh yeah, like crazy. Like I think and this is something I've like repeat to people again and again these days like if you told me that we could have gotten IOI gold then I would have just assumed that we could all just go on vacation. It's all over. AI is solved. No point in working anymore. We got it. It feels like nothing's nothing that much has changed right like life is still the same.

Yeah. Yeah. So I think that's super interesting. I don't have a great way to explain it, but I think that's actually what I spend a lot of time thinking about is like, you know, why is that the case? Cuz yeah, you kind of see this again and again in AI, right, with like solving chess and then like it doesn't really matter and solving go and you keep seeing it, but I think it surprises you every single time.

I think maybe one is we keep moving the goalpost. Yeah, we're very good at that. And then two is I think actually our just our definitions of what constitutes AGI is bad and we don't actually mean what we say when we say oh when we have achieved this then we have AGI so like clearly when we have achieved I go with a language model we have AGI it's wrong yeah and I think I think shifting the goalpost to some extent is correct like we keep goodhearting whatever goal post we have and I think it's kind of hard to like to be good heart is like too negative it's like I will cheat to do what I what you asked me to.

Mhm. But I don't think it was cheating. It was just scaling test comput at a meta level. I think the community not cheating but like makes a lot of like implicit decisions to go after the eval benchmarks that matter the most. And so verified for sure. Yeah. Exactly. But yeah, I hopefully not not that good-hearted.

Well well but like it kind of clearly is to some extent, right? Is like you know most programmers in the world cannot do IOI at any decent level but like we're still struggling to like automate most programming jobs or like you know there's a lot of stuff left to do so it's like it's like like we're like language models are here like junior senior dev and then suddenly for IOI you're like exactly and like there's something suspicious about that.

Okay. Um I kind of saw this at a meta level also with RL research. So yeah I did my PhD with Sergey 11 at Berkeley from like 2017 to 22 and that era of RL research was like super interesting because was like super hyped right like starting from about DQN in like 2015 and a lot of the methods that people were really excited about is like you know off policy learning like value functions like these kind of things and somehow that that stuff hasn't really panned out I would say And it's not exactly clear why, but in the academic literature, we thought we were making a ton of progress.

I think in retrospect, I had to say that we probably kind of overfit to the benchmarks pretty heavily. And you know, how I see this in retrospect is that we gave ourselves a lot of like new knobs to tune and then implicitly kind of tune those to fit the benchmarks. Everyone knew that we're doing that at some level, but I think it's hard to appreciate like that it's not just happening for a single paper at kind of like a meta level for the whole community that's happening too. Yeah. And I think the result is that like I don't know like a lot of the RL research that came out of that era I don't think is like that used and I think it's kind of for a similar reason that basically we're kind of like benchmark maximum.

I will full out say there was RL winter right like entire startups that were founded based on premise at the time basically gave up. Mhm. Some of them died, some of them pivoted, whatever. Yeah. Yeah. Yeah. I think in so because I was in in academia, there was still quite a lot of excitement over it. But yeah, it still felt quite academic and yeah, I in that era was a little bit frustrated because I felt like, you know, one of the pitfalls of academia is that it doesn't really reward like simple ideas that work and instead kind of tends to reward like kind of math ideas.

Those math three ideas also give you these like kind of implicit knobs to tune that allow you to like overfit while you know the things that actually work tend to be kind of simple ones that have less knobs and just generalized to like many things with there's just like less secret sauce to it apart from just throw a lot of comput. Exactly. Exactly. But those are things that tend to like it's like not intellectually interesting. Yeah. Exactly. And yeah from academic point of view it's like oh like why am I sitting in school like like.

Yeah. Yeah, I think I think for a lot of people who do PhDs, they're kind of wired in a way they want to like think about interesting new stuff. Yeah. And yeah, like you know, the scaling era kind of like probably suck to that. Scaling era. Uh is scaling era over since we're up?

Well, I I think I've just been like paged into that from interview. I don't think it's over, but there's definitely something interesting happening, right? Like the thing I was saying about like IOI and IMO. I think we'll still continue more or less on the same track. Like clearly, you know, like these labs are like releasing their new pre-trained models and they're like still doing like like much better than before. So, I think scaling is still happening, but I think it's happening in a different way or you know, it's like worth like seriously interrogating why is it that we're not just like automating all jobs right now.

I think my view it is something like RL the way it's applied to LLMs right now is kind of a weird funny tool where it doesn't really generalize beyond the training distribution that much. It generalizes to some extent and it generalizes in interesting ways but it's like very peaky, right? It can kill the training distribution like completely. It can be like best in the world at it with like not that much effort really but yeah it doesn't really generalize. So I think what we had to do is bring the world of economically useful tasks in distribution for RL if if we commit to using RL as a tool and you know it might be the case that maybe there's some like cool continual learning thing or something that like shifts a paradigm next year or something like that.

It really feels like if RL is a tool then a big thing that needs to happen is like it's not it doesn't feel like intelligence of the models is the bottleneck. It's more like you need to have products that bring the entire context of what someone wants to do into the product so that the LM can like see it and then you need to RL on top of that. Yeah. Um have you seen GDP val?

Yeah, I've seen it. Yeah. Yeah. Is that basically what you're envisioning? Um yeah, I haven't I haven't looked at GDP val. Uh actually haven't seen exactly like like roughly recap. It's 128 tasks across like any white collar job that takes like more than 5% of GDP, right? And they basically created all the context the eval on it and evaluated every model and famously OpenAI opens eval whoever runs that one always finds that anthropics are the best for yeah yeah it's uh yeah props to them for you publish it doing that it's actual science I think it's good but but like I think like in a in a sense of like generalizing beyond coding competitions to economically useful tasks that is it I I that is what is more important for GG6.

What I'd like to do is kind of like I haven't I just haven't read the like GDP traces close closely. It's not clear to me that you know like what like what is the job of an accountant entail and like what kind of context needs to be in the product so that you can do it PDFs. I see. I see it. Like so they they try to go as close to source documents as pos. I see. I see. Yeah. Yeah. So yeah, I think like roughly operating in this kind of thing is what I envision because it can't it can't be like an artificial like oh let me clean up this data for you to make it easy for the LM to process. No PDF in an agent go.

Yeah, I think that's what like roughly the right shape of the thing. And I guess how I imagine this being operationalized is that you'd want to co-design the product and the model so that like the product for whatever it is like I mean coding is kind of maybe the easiest first step because most of the context that you care about is just your code base and like being able to run stuff in the terminal and that kind of stuff and still like we're not that close really to automating it necessarily. But you know for for like all the other jobs the context is like insane, right? It's like all the conversations you had with your co-workers, like your Slack messages, you know, like for my um so at OpenAI, I was working on kind of like hyperparameter scaling research and I actually wrote not that much code.

Grid search or neuro architecture search. No, more like understanding how different science of deep learning in like 2020 where it's like oh you have to like initialize the layers in a particular way to get good scaling love kind of the analog for that for RL. Okay. The thing is like I didn't write a ton of code. So the LM you know like writing code is not the bottleneck but it's more like you know over the course of a year I like run sweeps look at like the interaction between different hyperparameters and kind of build up that knowledge for like a year of like just different graphs and to do my job the model would also need all those things in context you know to like successfully like you know kind of automate my job and you would kind of want a product that allows you to like bring all that context.

Did you have to build it for yourself or is there an existing one? No. I mean like uh no I mean I like I you know those graphs were just sitting in my head. Yeah. Right. So I think it would be pretty hard to like go automate that job. But I think what you need to do is build a product that kind of yeah has brings that context in and then you want to RL on top of that to you know understand like to teach the models to like use that that context.

Another conversation that I think has really come to a phase this year is kind of the the death of one model fits all. I feel like the point of the G in the AGI is like one model fits all. Mhm. I think OBI has like clearly abandoned that this year. Oh, what do you say? Fidget writing a blog posted the title that we are no longer doing one model fits all. Okay, interesting. Okay. Uh and I think Mark Chen or one of the other senior people that are not Sam also saying this in a in a podcast.

So so basically like the idea was you you started with codeex someone else was doing instrument of GBT then we launched GT 4 I guess 01 and 01 was kind of a supposed to be like a reasoning one model it's all and we merged the 40 and 0103 line into five and now we're splitting it out into five and five codecs again it's like just a weird well uh OPI is very guilty I mean you know I I don't think you should interpret those as like scientific facts about the universe. It's just more like open has a tendency to ship the orb chart basically. Yeah. Right. The world has a tendency. Yeah. Exactly. So I think a lot of it is related to that.

But yeah, I see what you mean by like yeah actually I do I do wonder if yeah like the current reasoning paradigm is just kind of fitting itself to this kind of peaky in certain areas thing, right? I don't think it's so much a matter of like model capacity though. It's just more of a another kind of organizational thing that like if you care really like a lot about coding, you probably don't have the data to do all the other stuff. I don't think it's so much a matter of like if if you had all the data probably you would benefit from just like training on all of it and you'll get some generalization between these but it's hard to find like one organization that cares about all these at once.

Yeah. Yeah. Yeah. So before I double click on just like the O series in in OpenAI, uh I do like to ask OpenAI people who were there, uh do you have a favorite blip story? Yeah, the blip was crazy for me. Like uh yeah, I was it was like Thanksgiving like everyone remembers where they were, what they were working. Yeah. Yeah, exactly. Uh I was at Thanksgiving with two OpenAI friends actually and then one of them on like Friday afternoon is like oh like Sam Alman just got fired. We just like working together like I'm like what? Oh haha like good joke. And then yeah know it was crazy. And then yeah it was it was just like a crazy weekend of just like ups and downs like you know we thought yeah like you signed a letter.

Yeah I did. It's like 95% of people sign up. Yeah. Yeah. Yeah. I thought, you know, like I move on to Microsoft or Well, I think maybe I had a slightly more complic like I actually do think that governance feels really important to me. Yeah. Uh because it does feel like no matter if we hit AGI in like 2 years or 10 or whatever, it's not clear that we have a good structure for the governance of it. Okay. And so it it is a question that I think we like probably should spend more time on. And I was like during that period just pretty willing to be like you know what like let's forget about the like equity and stuff like you know I think it's like good and healthy to have a conversation about like how exactly the government should work.

Okay you care about this. Yeah. Uhhuh. Right. So now the open nonprofit has this like secret shadow board of members that determine when when we've reached AGI. Yeah. Yeah. Is that better? Um, yeah. I don't have a like like maybe I I would say I don't have an answer like you know like it's just it's not fair above my pay grade but like yeah and even even even back then I was kind of like well I don't care. I I I do care quite a lot. Um, when the blip happened, one of my reactions was like, well, you know, this nonprofit board stuff, like actually, if it takes such somewhat like surprising maybe erratic actions, like maybe you'd rather just have like, you know, a thing like the Microsoft board, which is kind of like, you know, like probably like all the pensions of the world, but why like serious people, but also like, you know, the stakeholders are kind of like the whole world because everyone's kind of, you know, through their pensions or something invested in it. like maybe that is a bit more of a democratic way to run things and having like seven people uh run it.

But yeah, I don't I don't really know. It feels like we haven't solved governance like at all though, right? Like forget AI, even stuff like unhealthy food or like social media, it kind of feels like like whatever the kind of like capitalistic incentive is like doesn't actually like capture kind of good outcomes for society maybe. Yeah. So about like the the transition into reasoning, right? Um you shocked me by when by mentioning that the reasoning team is 300 people. Uh it's uh very it's kind of like you know now that um like you know when 03 was like kind of short as a product like I think it just like kind of gets like larger and larger how many people work on it. So yeah I think I've like lost track of the numbers but yeah like a lot of people contribute to the different aspects of like safety and whatever eval.

Like original 01 like I saw the video it was like a dozen people like uh yeah well even then like if you look at all the contributors it was probably more like 50 200 people okay um yeah so so I mean like let's let's tell that story from your point of view uh figuring out what does RL mean there and I guess was this a branch of any other prior work that you want to credit yeah um so I think yeah like setting the scene I guess you know in like 2023 people are kind of talking about oh like is uh unscaling scaling walls dead this kind of stuff. Every year every year. Yeah. Yeah. But especially I think especially that year it it felt pretty like serious you know.

I think in general OpenAI is really good about like having conviction in something and just like really like from first principles like going after it. And I think like the people who are kind of most responsible for that is probably like like Ilioskver and Yakob Pachaki. I think you even like uh like uh Dota was kind of more or less the same template in some ways, right? And that was 2017 and so a lot of the people there have kind of this like AGI in their bones kind of point of view. And they basically been convinced that like RL would be the way to get there. So I think for a long long time people have been convinced that something like that should work and it's just that it started to work once the like kind of pre-training got good enough.

Okay. Yeah. Uh I think human feedback is kind of like a a bit of like a side branch because you can't really pour that much compute into it, right? It's like you you take the model and you like elicit it to be a little bit better uh in terms of personality. But like the people there were really convinced that like at some point you know it's not about copying the internet like you can go um yeah do RL and uh you like you know that's like the path to like getting much better intelligence. So I think it it there's kind of like a long line of kind of like returning to RL in for in like different ways. And then it's just that around like yeah 2023 is when it started like really clicking.

It was kind of interesting because even you know it's it's not like those initial models performed like way better than the existing models cuz they're like smaller scale but people were very good at being like oh like this is kind of interesting like you know the the reasoning trace that you see here is kind of not something that you've really seen be so accurate um in other models like this like this one. kind of similar to how I think a lot of people didn't really think of GPT or GPT2 as something that was like super compelling probably I know that I personally didn't like uh GP2 that much of GP2 I was like okay whatever and then I think and then GP3 happened I'm like oh wo like I feel a lot of FOMO sitting in my PhD it's kind of that where I think it takes a bit of like first principles conviction to like um yeah decide that like oh this this thing like there's something here and we should really go scale it up and open is really good about once you decide that something is good then you just like scale it up all the way.

Yeah. Is was there an internal prototype pre1 that was like okay this is the thing we'll fund it to scale it up right like there usually is uh yeah yeah yeah exactly yeah what was the thing was the demo that like really sort of sold just this like you know like uh like running RL on even like a pretty small model producing like very interesting reasoning traces and like getting like uh surprisingly good scores on math. Yeah. in a way that we couldn't have done without like a bunch more pre-training. Um and then you know once once that looks good then you know more and more resources went to just like scaling up that new like law. Yeah. Um and you know things like adding um tool use and this kind of stuff.

The I think a lot of people make a lot of headlines on the the large models but I think a lot it's very underappreciated the minis. Mhm. Uh how well this solution works. any comments or just like discoveries on?

Nothing much to say there. I was also like not super involved in the mini stuff. I think maybe one thing not not exactly related to that but like it seems like externally people are kind of very like oh uh like research seems to come in these like big leaps. Okay. Uh but I think internally at open it feels very smooth like you have a bunch of experiments some of them have inconclusive results but maybe you stack them. Yeah, exactly. you stack them and just like you keep scaling, you keep like having like different runs that you know get a little better each time.

Okay. Um so I think that's maybe one other aspect that's like a little underappreciated is that like I don't know like in the media there's just these wild swings between like oh we're Yeah, exactly. And I think like internally at Big Labs it's just kind of like oh we're just like chugging along like maybe this month is a little better than last month or something but it's like not as crazy up and down. I think the question is there used to be more of this and now I know there's less which is well the stuff we've released we're like you know internally we're like 6 months ahead of like part of the reason why people ch opening I wasn't that excited about chat GBT's launch was because you already had GT4 they're like oh we just like put this out like we're already way ahead I think now people are just releasing things as they have them like I think yeah especially because there's some like competitive pressure right uh I think People are probably pretty worried that like if you if you let a lead linger for too long that will like grab a lot of market share like I don't know like Nano Banana Pro right now is probably like you know it's like a month.

So I would say like now the lead internal to external lead time is about one to two months. Yeah. Which exactly tiny pretty pretty short. Yeah. Tiny. Anything else on reason on reasoning side? I guess you can talk about on on say the work on coding. Um anything surprised you or like is an external misconception on 0103 side um before we go to cursor?

Well, um, not really. Like, uh, yeah, I mean, uh, it's, yeah, pretty cool. Like, uh, I think, you know, it felt already by like maybe early 2024 like, oh, wow, like this recipe like really works and we can see how far we take it. And, um, so I think, um, you know, it was like very steady progress and, you know, by that point it was probably pretty predictable that we could like, you know, really like, uh, smash like, you know, things like IMO or IOI.

One one funny thing that kind of happened is um while this was happening uh I went to this conference called the curve. Yeah. Um which is about like kind of AI progress and Joseph Gordon Levit like Yeah. I went last year uh this was before the Owen stuff was released. Yeah. And I like went to this thing where people were kind of making bets on um where we would be on epoch AI's like um the the math the epoch a math exam and like humanities last exam and stuff like that. And um their estimates were like, "Oh, we'll be at like 10 20% in like 2027." And I think at the time there was like, you know, models internally that were like already better than their estimates.

So they're like it's like off by like, you know, 2 years or something. And the interesting thing is like those are also people who are kind of like, you know, predicting that there would be like Dyson spheres by like 2035 or something. Okay. So you see what I'm saying? So like they're the current estimate is is way under. Yeah. Are you too pessimistic in the short term, too optimistic in the long term?

Um, yeah. Well, I don't know if that like I I there might be days since 2035. Like I I don't I don't regard it. Uh but uh I think that that is like one interesting aspect. Um is that yeah, I think people still seem pretty miscalibrated in different ways. Uh I I do really appreciate how that community uh makes predictions though, like um cuz I think most of the rest of the world just kind of like cynically says like, "Oh, I I saw this the whole time." like yeah I do appreciate that. Is this EA adjacent?

Uh yeah I think I think it's Yeah, exactly. It's like it's like that group. Yeah. Yeah. Yeah. I I like that they they like to sort of register their opinions ahead of time and and I think like broadly uh the people who've been you know uh the capabilities predictions in that group have been broadly correct if you look you know from like 2015 to 2020 or something like where I think a lot of people kind of thought that AI was like a sham or like you know not really going to be that useful for a long time and actually you know it is it's somewhere in the like 2030ish thing that like it will probably reach like human level intelligence.

Yeah, it's weird. So like I I I feel like a skeptic when I keep saying like everyone always predicts that AGI happens in their lifetime. That's very convenient for whoever. And like we have a consistent view of history where you make see like people in the 1800s and 1900s making predictions. It somehow always lands in their lifetime whatever the the thing is but like this time it might happen almost surely, right? Like I don't know. I'm I'm pretty sure here. Yeah. Uh yeah. So, so it's it's an interesting observation like how different are we from our predecessors in terms of developing a technology.

Did the Deep Seek moment this year also this year crazy uh change anything internally? Uh, not really. Yeah, I think that was I think more so just like surprised that um it created such a moment like it was kind of confusing, right? It was like deepseek shows that Nvidia chips are actually more useful than previously thought and like Nvidia's stock like goes down a bunch. Like it was kind of like uh I think it's more like okay well I I'll I'll do the steel man that side which is well you don't need the top of the line Nvidias. You can just use the the sort of previous generation or the shackled ones they sell to China to do an equivalent amount of work uh for for a reason model.

I see. Yeah. But then it was also I guess the feeling in open eyes that like well I think we we had a better model already at the time, right? So um and it was quite valuable like uh like smarter models were clearly quite valuable. So you kind of wanted to be at the frontier. Okay. I So I wasn't quite framing this as like a race dynamics thing between labs. It was just also more like well were they right? Were their approaches right? They had R10 which is kind of like a really cool branch. So more like commentary on what we learned about RL this year in.

Yeah. Yeah. Well, it does seem like basically um a lot of the labs have kind of like converged onto some similarish way of doing RL and they're all kind of back at the same level of like Frontier again. Like even the anthropic models like the Opus 4.5, it has this kind of like uh there

[State of RL/Reasoning] IMO/IOI Gold, OpenAI o3/GPT-5, and Cursor Composer — Ashvin Nair, Cursor

The Reasoning Race: Why RL is the New Scaling Law by Latent Space

Actionable Takeaways:

Others You May Like

Inside the economics of OpenAI (exclusive research)

Inside the economics of OpenAI (exclusive research)

Inside the economics of OpenAI (exclusive research)

[State of RL/Reasoning] IMO/IOI Gold, OpenAI o3/GPT-5, and Cursor Composer — Ashvin Nair, Cursor

The Reasoning Race: Why RL is the New Scaling Law by Latent Space

Actionable Takeaways:

Join 10,000+ smart readers on our AI newsletter and stay ahead of the curve

Others You May Like

Inside the economics of OpenAI (exclusive research)

Inside the economics of OpenAI (exclusive research)

Inside the economics of OpenAI (exclusive research)