Epoch AI

January 29, 2026

AI math capabilities could be jagged for a long time – Daniel Litt

The Zero Marginal Cost of Conjecture: Why AI Math is Jagged by Epoch AI

Author: Epoch AI

Date: October 2023

Quick Insight: This summary explores why AI dominates math competitions yet hits a wall in original research. Readers will learn how the collapse of trial costs changes the scientific discovery process for builders and researchers.

💡 Why is solving a 40-year-old math problem sometimes boring for an AI?
💡 What is the secret ingredient humans have that models currently lack?
💡 How does the Open Problems Benchmark prevent AI labs from moving the goalposts?

Daniel Litt joins Epoch AI to discuss why models crush the International Mathematical Olympiad but stall on original research. The conversation highlights a future where the cost of mathematical curiosity hits zero.

The Marginal Cost of Curiosity

"The marginal cost of trying something is getting very small."

Trial Cost Zero: AI makes running a thousand experiments as easy as one. This turns idle questions into active data points.
Rapid Search: Automated search for cool stuff replaces manual labor. Builders can now explore high-dimensional spaces without the tax of boredom.
Labor Evolution: Math becomes a continuation of computing history. This moves the human role from grinder to curator of taste.

The Intuition Bottleneck

"AI can't understand something for me."

Embodying Understanding: Human mathematicians provide the So What that text cannot capture. This preserves the value of human capital in a world of automated proofs.
Long Term Marination: Models lack the four-month period required for deep breakthroughs. This creates a moat for researchers who develop spatial intuition.

The Benchmark Trap

"Benchmarks primarily end up measuring knowledge."

Knowledge vs Reasoning: Models solve hard problems by remembering literature rather than thinking through steps. This masks the true gap in autonomous reasoning.
Verifiable Progress: The new Open Problems Benchmark targets problems with no known human solution. This forces models to innovate rather than just recite.

Key Takeaways

🌐 The Macro Trend: The collapse of trial costs turns scientific discovery into a search problem.
⚡ The Tactical Edge: Prioritize verifiable problems where AI can provide a clear reward signal.
🎯 The Bottom Line: AI will solve mildly interesting problems soon, but the Big Ideas still require human marination.

Podcast Link: Click here to listen

You have a new tool. It's a lot of fun to use, but I think a lot of people are talking about accelerating science. I'm not necessarily skeptical that that's going to happen in some sense, but I think they're talking about it without trying to rigorously run tests as to how much the tools are accelerating things.

Anytime I think about an open problem, the first thing I do is ask models for some ideas. They're almost always sort of nonsense. Never gotten an idea that sort of passes the sniff test for sort of a deep open problem.

It's weird that we haven't seen a lot of mildly interesting conjectures resolved by AI. I guess we're now starting to see those arguably. So I think right now at least where I expect AI to have a significant impact is that the marginal cost of trying something is getting very small.

Hello everyone. I'm Greg Bernham. I'm a researcher at Epoch AI and this is my colleague.

I'm Anson. I'm also a researcher at Epoch AAI and we're joined today by Daniel Lit.

Hey nice to see you guys and meet you in person. I'm Daniel Lit. I'm a professor at the University of Toronto in mathematics.

I wanted to start with sort of a fun one. Could you characterize what's the hardest math problem that AI systems today can solve? However you might think about that.

That's a good question. So I think okay, so of course there are these examples where now every frontier model basically has gotten a gold on the last IMO. And I think that's a pretty good baseline for what you should expect.

So I think probably later in the discussion we'll discuss a few open problems that have been resolved either with the help of AI or autonomously by AI and I think it's probably accurate to say that those are about at the level of some kind of a mid-tier low tier IMO problem.

Gotcha. I think there's some evidence that systems can do a little bit better than that right now. And I think you know with some work one can probably even elicit better performance than that from current gen models.

But I think that's overall like about the level of a contest problem. Something you would expect a strong high school student or undergrad to solve in a couple hours seems to be about where the systems are at.

And one thing you've stuck your neck out on is that within the next year we might see like mildly interesting conjectures resolved. Maybe you can say what that means.

So for me I think what I mean by a mildly interesting conjecture is something that someone has stated in print. So at least one person was really interested in it and hopefully someone has spent at least a few hours thinking about it.

I think there are a good number of problems at that level. I would expect that current gen systems can resolve some of these things with the you know pass at 1 million or something like that.

Gotcha. I think arguably some recent examples actually fit that bill. So maybe that prediction where I stuck my neck out has actually come true. Although there's some questions about about exactly you know how interesting such problems are right.

Very nice you've actually referred a couple of times already now to sort of time frame for human solve I was going to get around to that eventually is that a metric you like in general I'm curious what you think about that.

To be honest not really I think that's not a great way to think about difficulty so right so and I am I mo problem. You know, a strong high school student, a very strong high school student, the best students in the world have about an hour and a half to solve the problem.

That is definitely gives you some upper bound on difficulty. You know, I think that for a lot of those problems, if you gave them to a, you know, a professional mathematician, they would actually take longer.

Sure. And that's just because, you know, they'd fiddle around. They wouldn't be motivated in the same way as in a in a contest. And also just solving a contest problem there's a certain very constrained set of techniques you can use and uh you know in research mathematics you're not constrained at all.

So you're you you don't just try the tricks and see see what works. You try everything and you fiddle around and you maybe break out a computer and do some you know work out some examples or whatever.

So yeah I think difficulty it's sort of a funny thing like how do you judge it? And a lot of the times maybe the best way to judge it is after the fact. So you look at the the structure of the proof and you say, "Oh, well actually this wasn't that hard."

I think that's a little dangerous, right? Of course, espe when we're talking about like evaluating a model. I think that leads to goalpost moving. Like the model comes up with a proof and you say, "Oh, well that wasn't that hard." You know, I didn't have to do anything. I just pressed a button.

One thing you've said often a lot on Twitter is that the amount of uplift you get from these systems is actually quite limited. But then I think there are also some other people who feel like that the uplift is quite a lot bigger.

I'm curious like how you would characterize like what the difference is like what's going on.

Yeah. I mean I do think there's some areas where the models are just better. So you know if you want are working in optimization for example my sense is that there's a lot of people for example at open AAI who are experts in that area and they've generated a lot of data and probably use their their human expertise to guide the models and um so I I would not be surprised if people for example in that area are getting uh more out of the models.

Comparatively my sense is that in algebraic geometry number theory the models are just not that good. I think also there are areas which are more amendable to sort of to the tool use the models have have access to.

So you know if you're trying to come up with a counter example to some inequality or whatever you know writing code is is a very natural thing to do. If you're trying to come up with a counter example to a conjecture about the intermediate jacobian of a cubicfold probably there isn't really any code you can write that will help you.

That's interesting. That said, I also think probably people are just misreporting the amount the models are helping them. You know, you have a new tool. It's a lot of fun to use.

But you know, I think a lot of people are talking about accelerating science. I'm not necessarily skeptical that that's going to happen in some sense, but I think they're talking about it without trying to rigorously run tests as to how much the tools are accelerating things.

I'm very happy to believe that some frictions are being removed but there are also a lot of bottlenecks in research that I think the models just don't touch and you know so are you really accelerating if you uh you know have removed a bottleneck to you know opening a paper and and finding lema 3.7 but not to you know having a good idea.

Yeah. This is why we're glad to have you tell us what's right and what's wrong.

Yeah. One thing I find interesting about that is like um the typical way I would characterize differences in capabilities or the spikiness of AI capabilities across domains is through something like Morovx paradox. But then it seems like for things within math like we also get this kind of spikiness within math and I'm I guess we were kind of alluding to some of this coming from what training data there is.

Some of it is coming from like you know what is more amanable to AI. Do you think like that's like explaining most of it? You think there's anything else like or how we would characterize like the relative importance of these factors?

Yeah, I mean what I what I do think is like doing research math is a it's a skill that it's a little different from context math and that it's a very high dimensional skill. So it's not you know there's not some more or less finite set of known techniques that are useful.

I mean okay sometimes context math requires a little bit more creativity but okay to be honest the models have not really succeeded solving problems where I think one sure can argue that that that's required.

So yeah I think you know a lot of the jaggedness we see in math I think is just the same jaggedness we see everywhere. I don't think there's anything special like in my view the the biggest obstacle to uh the models like autonomously doing high quality math research is just the same as the biggest model to automating anything which is like they can't do sort of long context tasks you know something that would take a a human in six months.

There's just no no task that takes a human six months that the models can do. So yeah I think once we start seeing the models performing like software engineering tasks at that at that scale I would not be surprised if they also start doing high quality math research.

So I just actually don't think there's anything special about math in this regard.

Makes sense. Yeah. Well, one sort of galaxyrained take I have that I probably that I well hold lightly is that models are weakened spatial reasoning visual spatial intuition. I that's not the galaxy brain part.

The galaxy brain part is that that is a big latent factor explaining what they're good and bad at. I do know from hearing narratives of various mathematicians about their thought process that often there is something quite spatial or geometrically intuitive about a reasoning process.

And I do wonder a little if the AI models are especially good when there's a more s symbol manipulation approach to solving something. coding being the most obvious but even just working through algebra that might be.

I to be honest I'm a little bit skeptical of that kind of explanation like first of all there's a lot of diversity in how mathematicians think about mathematics so some of you know some of us are whatever shape rotators some of us are more like weird cells or I was I was curious actually trying to question this belief whether there are aphantasic mathematicians actually quite famously some even geometers who are.

Right now there's a huge number of mathematicians working on problems and we have lots of different approaches and that's you know partly why you know there are mathematicians who I think are overall if you tried to put all mathematicians in the line way way better than me at math nonetheless I'm proving some theorems which they they haven't proved or probably couldn't prove in the same amount of time it's just because we have a different approach.

So I think also you know now there are whatever like three or four models that can uh you know reasonably well solve uh math problems they have slightly different approaches a much smaller diversity of approaches than humans. And I think we actually see that reflected in the benchmarks.

It seems like the the set of problems that are being solved is there's actually quite a lot of commonality between the models is my understanding from from you know hearing hearing what you guys have been doing analysis. Whereas you know of course all the problems in the benchmarks have been solved by at least one human.

Yeah so I think yeah my sense is that the mo you know you should think of the model as like a mathematician and so there's certain problems that they'll be quite good at and certain problems that we'll be quite bad at but maybe we shouldn't read too much into what those problems are. It's just an artifact of the fact that there's you know only only two or three models to look at to begin with.

Since you mentioned it, it's a big question that is very hard to answer, but I'm curious. How much transfer do you think we're seeing in capabilities across different sub fields of math?

So, if if you get this ed surely you get an edge from generating synthetic data, how big an edge do you as capabilities grow, do you not really need that or do you have any sense of this?

Yeah, it's a good question. I mean, I think it's sort of hard to say. My my sense is that most of what happens when you try to get a model to to to prove a statement in algebraic geometry is that it tries to find it in literature or like you know you could find something very close in literature and you know just you know try to make one or two reasoning steps beyond that.

Like it's really compared to to what happens when you ask it sort of a combinatorics question it's sort of not at least in my observation doing the same kind of like you know real attempt to solve a problem.

Interesting. And I think you know compared to a a graduate student you know who maybe know like a graduate student who knows all the stuff the model knows um about algebra geometry number theory or sort of fancy math topics would be able to do a lot more reasoning and like really try to prove theorems.

So it it seems to me like um you know the models are sort of superhuman in some mathematical subjects in terms of knowledge. But they don't somehow yeah whatever reasoning capabilities they have or whatever tech like they haven't necessarily learned the same techniques that a student in who who has that same knowledge base would know.

This is just based on vibes. Totally. Of course. But I don't know. We've chosen you to hear your vibes.

Uh what areas do you have the sense are the models are stronger in like native reasoning?

Yeah, I think I mean okay at least I don't know about superhuman but they're definitely super me at like proving an inequality for example that kind of thing. My my guess is just it's sort of easier to generate data and there's probably a lot more more data in that area than in algebraic geometry.

And when you say an inequality, are we talking like contest style inequalities? Not not like something a little more interesting or important from analysis where everything's inequality.

Yeah. I mean, yeah. Something Yeah. And I know Yeah. So something where where coding is useful. They're they're very, you know, typically very strong like you know, every once in a while I'll need to prove any in quality and like, you know, now my first step is just to sort of explore what the space looks like by by writing some code using a model.

I think we we'll get into this a fair bit more, but you did mention two things I thought were maybe a little intention. One is a thing that the models are missing is having a good idea, but another was if they can just do things for 6 months in many domains, maybe they'll be good in math as well.

I say these are intention because I think there are six-month projects humans do that don't require having like brilliant flashes of of insight or anything particularly planning a wedding or something which models could not do today and takes months like do you think creativity comes just with time and so so or I don't know how how do you yeah that's the question.

I know my I agree that those two two statements were kind of intention what I would say is I think there's some kind like continuum between like just applying a technique and that that's well known and developing a new technique. Arguably a new technique is, you know, you you take a hundred different things and put them together in some way.

And yeah, I presumably at least one ingredient to to doing that is is just time. Sure, fair enough. It's just not clear to me whether it's the only ingredient.

I I would say my experience of doing math is very rarely that I have a, you know, a brilliant idea that just solves the problem. Of course, sometimes, you know, you wake up in the middle of the night and like the problem is solved. But, uh, typically for that to happen, you have to marinate in the problem for four months.

So, like there is some secret ingredient of time. Uh, I don't know that my introspection is welldeveloped enough or trustworthy enough to say whether that's the only real ingredient.

I mean, I think there are Okay, let me let me amend that slightly. uh you know there are other things that happen like you develop philosophies or or analogies and there's some kind of mystical aspect to doing mathematics that I think we haven't seen the models uh do but it's also sort of BS right like uh that that that's mystical mystical aspect of doing mathematics is maybe just kind of compressing a lot of a lot of sort of ideas that you've read or absorbed um into some kind of package that's digestible to human so I don't know maybe it's close to context compaction or something like that interesting.

Yeah, I I guess there are these big analogies for like intelligence is search or intelligence is compression or something like that and we're just better at that for now and models have been getting better at it and yeah I'm generally speaking I'm also skeptical of those kinds of analogies but like say more.

Oh I just think you know my sense is that there are a lot of ways to be good at math like like you know if you just look at what different people are able to do there's not that much overlap in capabilities like I don't know that there's any mathematician who like can prove the same theorems I'm proving and there's plenty of mathematicians doing stuff where I think their way of thinking is just quite different from mine.

Oh, but but I I suppose I mean maybe I said this poorly, but if I think of maybe it's not falsifiable. If I think of intelligence is search, then what that sounds to me like is you and a different mathematician are pursu pursuing different heristic search algorithms or something like that.

Sure. I I think that there's maybe a way of making sense of it, but I would argue that's sort of not not very contentful, right? Sure. Like if if intelligence is some hugely high dimensional space and you just make a name for that space doing a good job at that is yeah I don't know you can't tell if it's enlightening or not.

Could you contextualize the utility you're finding in the context of previous generations of tools like literature search is better but Google Scholar existed you know and that was presumably an improvement over card cataloges and conference proceedings.

Yeah, I mean I think right now the tools are on a continuum with with previous generations. So maybe in two ways. So first of all, literature search. Yeah, definitely the models are now at least for some literature search desks better than Google or better than Google Scholar.

That probably saves some time. Is it a lot of time? Well, I don't know. I mean, how much does it how how much time does it save compared to going to the library and probably couple hours?

A bit every once in a you know that kind of 2% productivity long run productivity improvement right on trend in general I think yeah that those those those improvements are seem to typically be as you're saying like fairly small I would I would be skeptical that this this is more than a percent or two for like like if um AI progress stalled today you wouldn't expect that we already have baked in an explosion in quality of mathematics compared to what was developed 10 years ago yeah I would expect yeah a sort of similar productivity growth to what we've seen which is you know maybe attributable in some to some extent to technology probably mostly attributable to population growth.

You can ask just the the same question at Google. Like how much did Google or email that kind of thing? Uh did you live through that? Did that feel like you know so I was born in 1988 so I became a I got my PhD in 2015.

So yeah Google was already already around by the time I started thinking about math. So I I have actually asked older mathematicians this question. I think the general consensus just self-reported is that like Google did increase mathematical productivity.

But it's pretty hard to see if you just like try to look. I mean it's hard to come up with the proxy that lets you measure this. But I I don't think it's obvious just from vibes for example that like the advent of Google led to like really you know a remarkable growth in good new mathematical ideas. I just think literature search is not really where the bottleneck the main bottleneck is.

Makes sense. Um so there's another precursor to to the development of of these sort of AI tools which is just the development of computing. Sure. So we saw a lot of progress in a lot of different areas like in the you know already in the '60s '7s 80s just with the advent of computers.

So for example uh maybe a famous example is Oiler had this this famous conjecture the sum of powers conjecture which is you know when is there always a solution to a sum of kith powers being another kith power for some number of kith powers I don't remember the exact number so the first counter example was found just via computer search uh and then maybe more famously the the case of fourth powers was resolved by elkis in 1988 uh using uh a sort of very clever computer search got but but like that method would have been dead in the water without computer search even though there a lot of cleverness.

Yeah. Yeah, that's right. Yeah. So, he he was sort of found a way to that to to make these questions accessible to a 1988 computer. I think probably now still they would not be accessible to just naive force.

But yeah, that's fascinating. But yeah, so this was a huge development and I think in a lot of ways like if we just stopped uh with existing models like we would see some kind of natural continuation of that of that trend.

What what would that look like? I guess I if right now we mostly use them for literature search and coding maybe it's the coding aspect like alpha evolve style what what do you think would be the developments you see?

What I'm what I'm imagining here is you know like uh sometimes to make progress in mathematics you have to do a search. Yep. Um so like this Elky's example this conjecture boiler. Um and uh a lot of the time uh that search is sort of there's some kind of art to it.

Like maybe you're working it through a thousand different examples and you don't have an algorithm to work through each example. So each example requires some little idea or executing some standard technique but uh you know it's hard to write a computer program to do it.

So maybe algebraic geometry you you need to work through some there's parts of it you can automate with a with a you know a Python program parts of it that require some real idea. Um so I think um you know probably where the models are now you could imagine automating with relatively high reliability some of those kind of example searches.

That's cool. I see. So these are cases where it would have had to be much of it would have had to be or or at least the amount of manual work scaled linearly the size of problems and now you work cut that down.

Yeah. So I think that's uh that's something that I I I'm really looking forward to. I think like you know sometimes I'll write a paper which is just like here's a beautiful construction right and like to get some to get to to find it you know I need to do some search and like kind of think about where the right place to look is and you know we already see Alpha Evolve is maybe some baby version of this uh where we kind of see some kind of automated search aided by a sort of clever LLM um I think I think that's that's something where I can imagine that having a really significant uh impact on mathematics um but I think it would be like sort of the same in the same spirit.

Yeah. It wouldn't be an automation of mathematics. It would be it would be a continuation of figuring out how to use computers to reduce labor open up new.

Yeah. If you think about you know maybe the proof of the poor color theorem or the conjecture that's a similar thing where you've many cases that one needs to check and you have a computer you figure out how to get a computer to do that.

I if you and I suppose you could get lots more of this if you just way amped up the compute like like one question we ask about AI because it's true all the time is what what's compute constrained do you have a sense of are some areas of math you could really get some fascinating things if you just gave them you know a thousand times as much compute as they have now or are these kind of hard to find the little ones I don't know.

Yeah I mean I think a thousand yeah so so people do this. So if you talk to computational number theorists, I've had actually really fun conversations where someone takes a problem and say, "Oh, well this problem will be solved in this year just by taking Mo's law and saying, you know, when when will it be no longer be computed?" Um, and th those uh those predictions are, you know, reasonably accurate.

Maybe maybe you know about this example. Uh can you write an integer as a sum of three cubes? Sure. So here we sort of know exactly how hard at least conjecturally each instance of this problem is. So, so you fix an integer, try to run it as a write it as a sum of three cubes. We we kind of know about how big to expect those cubes to be.

Yeah. Um I see but uh you know I think it's cool to do that for numbers where it's hard but you know you you double the computer, triple the computer or multise yeah you get a few more get a few more interesting integers but yeah there's a question as to to what extent that constitutes progress in my view. I think there are definitely people who are more excited about it than I am.

Fair enough. Um, makes sense. Yeah, I'm I'm imagining things where uh, you know, you're maybe uh looking for an example of some construction where there's not just a known algorithm that that that that computes it and you instead ask, you know, GBT to to cleverly search and and uh and and let it come up with ideas.

And if you give it enough test time completely, yeah, maybe maybe eventually it will it will come up with something. I'm just here. I'm not imagining it coming up with better and better ideas as the test time continues, but just trying more and more things.

Sure. And certainly if you had something ver like some way to verify even if it's just more test computing, does this idea look good or something like that? Um, okay.

And then I did want to just cover this. Are there problems that AI is causing right now?

I mean, I think we know like cheating in college is clearly an issue. it's more tempting lower bar to cheating sort of thing like what junk papers I don't what's the state of this so I mean there's definitely junk papers so I I think I I started counting in maybe September of this year uh papers on the archive with the the phrase hodge conjecture in the title of the hodge conjecture is one of the the six remaining open millennium problems it's the one uh that's probably maybe the statement is the hardest to understand for a lay person so for a long time it was safe from from cranks because you just like couldn't write anything comprehensible about it.

So now that's no longer true. Of course, of course, you know, the frontier models can write, you know, reasonable looking text about the Hodgek conjecture. Uh so yeah, so I think in September and October of this year, there were something like 12 or 13 uh papers posted to archive math. Math. Algebra geometry uh with hot conjecture in the title are abstract and all but one of them were nonsense.

Wow. uh and I of course I can't prove that they were generated by by an LLM but based on the writing style it was quite clear. Um so to be fair this was there were maybe you know 12 or so papers of this form right right uh that came from a smaller the the number of authors was not 12 it was maybe like like six or I see that that's some repeated vendors but yeah so I mean to what extent is this causing a problem well you know it wasted several minutes of my time um as the LMS get better at writing uh coherent looking text you know before you would you know have to spend 10 seconds to find nonsense.

No, you have to spend a few minutes actually. So, like I think the most serious offender here, the argument really did not make sense. But I, you know, I really had to go to the middle of the paper and see that some some statements were just nonsense. Like the introduction was totally reasonable and um interesting.

Yeah. And making some quite dramatic claims, which is why I was motivated to to check that it was it was BS, but uh uh yeah, it wasn't it wasn't a totally trivial job to do, and I think that problem will get harder.

Yeah. Um so in particular in areas where like you know formalization like like for example using uh lean or other uh formalization software is not really practical right now I think that's going to be an issue.

That's interesting. Uh and you can imagine way worse versions of that which I'm I'm sure are really happening where a serious person like maybe a graduate student who stuck on a problem uh you know uses uh one of these models to generate a nonsense proof of a of a lema in the middle of their paper where they you know where maybe it doesn't make sense. there, you know, 99% of the proof of the paper is probably correct has no value because there's some nonsense.

That's an interesting and that's very hard to ca to catch. I mean, that of course already happens without people have been bullshitting since time in memorial. There's lots of wrong papers out there, but um you know the a lot of this is about the marginal cost of doing something and the marginal cost of of lying and cheating is getting a lot lower.

Again, if capabilities froze, is this sort of at the level of society manages because society generally, you know, muddles through? Are there any things you'd be particularly worried about?

I mean, I I think it's just contributing already to a like there's right now in mathematics, I think a refereeing crisis. There's way more papers that that are being generated than can be refereed carefully and um I think it would it would contribute to that probably. It would it would continue to to worsen, right?

Um, a lot of that is about the incentives in math academia and not the models themselves. Uh, I think we'd manage the way we've been managing, but it's not imperfectly.

Yeah. I mean, and I do think there's some hope here like of course the models can also be help check uh check papers and and I mean there's already sort of nice tools that are being developed. Yeah.

So, building on top of that um I want to think of it longer term now. Great. A big part of all of this like um progress in AI and math is about like compute and scaling. Uh and uh one of the things that we discussed earlier was uh being able to you know get the AIS to run lots of examples go help you work through lots of examples and do this kind of stuff.

Uh so I'm kind of curious like um how uh we should expect like the fields of math to evolve uh as we get more of this like ability to run lots of experiments at scale. Um so I'm kind of thinking like you know with previous numerical simulations maybe that enabled uh say lots of simulations in meteorology and economics uh what should I expect in the case of math where we have like systems that say are able to solve frontier math and we're able to run lots of uh these models at the same time.

Yeah. So I here you're you're asking just to make sure I understand you're asking about like the sort of uh thing I was proposing earlier where you're just sort of working out lots of examples not like trying to get the models to solve through one hypothesis. Exactly.

Yeah. Yeah. So I think I think we'll see kind of a continuation of previous trends like you know coming with the advent of computers like we'll be able to work through uh larger volumes of interesting examples like maybe I don't know I want to find an algebraic variety with properties x y and z. can like just have the model start trying things like the main the main benefit I think you get is like the cost of trying like the first dumb thing you would think of has has gets very low like you know historically uh if I want to you know find a construction I have to sit down and try a few things uh even if doing that requires very little cleverness um you know it still takes a few days of my time maybe uh and uh you know I'm a busy guy and.

Oh, moreover, you know, this might be just an idle question and like uh I have other things I'm more excited about. Okay, so there's some some opportunity cost to doing these things. That opportunity cost and also like just the monetary cost of like the marginal cost of of trying something will get very low even if you're asking a model that's like not that capable to do it.

That's that's very valuable. Um so yeah, I think uh you know one way mathematics moves forward is like you just like look for examples of cool stuff and then like you just every once in a while you discover such examples. you don't that doesn't necessarily require like deep insight or brilliance. It just requires spending some time and so having automated search for cool stuff would be a really big deal.

Um, so here I'm thinking about like I don't know there there's uh every there's some kind of sporadic examples of cool things like the sporadic Planet Simple groups or the exceptional league groups where um you know of course people are searching for them in some kind of fairly principled way but you know ultimately at some point you have to make a discovery uh and uh that that a lot of the time you just make that discovery by like oh noticing someone worked out a cool example and then you like observe some some interesting property of that example that's that's happened to me.

I mean I think some of the projects I'm most proud of are like you know I noticed something cool in the literature and like sort of drew some consequences out of it. That's interesting. Um so that can be very productive.

Yeah, I think that can be very productive. I think it's a big part of how math moves forward like um it's not just that like you know the most brilliant people are proving amazing theorems like there's a huge background of of uh you know people like me who are you know maybe more uh you know uh workmanlike uh doing uh you know doing work and and like just just thinking about lots of cool stuff and every once in a while they'll they'

AI math capabilities could be jagged for a long time – Daniel Litt

The Zero Marginal Cost of Conjecture: Why AI Math is Jagged by Epoch AI

The Marginal Cost of Curiosity

The Intuition Bottleneck

The Benchmark Trap

Key Takeaways

Others You May Like

State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490

Inside a Chinese AI Lab: How MiniMax Builds Open Models

JetBrains + Weights & Biases: Establishing frameworks and best practices for enterprise AI agents

AI math capabilities could be jagged for a long time – Daniel Litt

The Zero Marginal Cost of Conjecture: Why AI Math is Jagged by Epoch AI

The Marginal Cost of Curiosity

The Intuition Bottleneck

The Benchmark Trap

Key Takeaways

Join 10,000+ smart readers on our AI newsletter and stay ahead of the curve

Others You May Like

State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490

Inside a Chinese AI Lab: How MiniMax Builds Open Models

JetBrains + Weights & Biases: Establishing frameworks and best practices for enterprise AI agents