Weights & Biases

February 5, 2026

She Raised $64M to Build an AI Math Prodigy | Carina Hong, CEO of Axiom

How AI Formal Verification Solves Unsolvable Problems and Redefines Software Assurance

Author: Weights & Biases

Date: October 2023

Quick Insight: Axiom Math builds a self-improving AI reasoning engine combining probabilistic generation with deterministic formal verification. This approach achieved human-beating results in advanced mathematics and promises to revolutionize safety-critical software and hardware verification.

💡 How does Axiom Math's system achieve human-beating performance on the notoriously difficult Putnam exam?
💡 What are the core components of a self-improving AI reasoning engine, and how do they interact?
💡 Beyond mathematics, where will formal AI verification deliver the most immediate and profound impact in tech?

Introduction: Karina Hong, CEO of Axiom Math, a former academic mathematician, leads a company that just raised $64 million. Axiom's AI mathematician achieved a top Putnam exam score by marrying AI generation with formal verification. This is a new paradigm for provable correctness.

Top 3 Ideas

🏗️ The Verification Imperative Axiom's mission is to build a reasoning engine that is self-improving and that combines generation and verification.

Probabilistic AI Fails: LLMs are unreliable for safety-critical domains due to unprovable guarantees. Relying on them for high-stakes tasks is a non-starter.
Formal Languages Ground: Axiom uses formal languages like Lean to ground natural language, enabling deterministic tooling with probabilistic systems. This allows higher sample efficiency and provable correctness.
Self-Improving Loop: The system features a "prover," "conjecturer," and "knowledge base." This creates a feedback loop where AI generates, tests, and refines mathematical understanding.

🏗️ Beyond Human Intuition The power of formal systems: you cannot handwave.

Rigorous Detail: Axiom's system meticulously checks every fine-grained detail, unlike human mathematicians. This ensures absolute soundness.
Unconventional Paths: The AI often takes different, brute-force approaches compared to human intuition. This suggests AI can discover proofs humans might overlook.
Auto Formalization: Converting formal Lean proofs back into natural language is easier, making AI-generated proofs interpretable. This bridges machine-verified truth and human understanding.

🏗️ Mathematician as Conductor Mathematicians will learn to work on a different abstraction... have the AI mathematician be the diligent grad student.

New Abstractions: Mathematicians will adapt to AI as a tool, like adopting LaTeX. This frees them from manual labor, allowing focus on intuition and discovery.
AI as Grad Student: The AI acts as a diligent graduate student, meticulously proving a human "professor's" intuitions. This accelerates research by offloading tedious verification.
Discovery Loop: Axiom's system aids mathematical discovery by constructing examples and counter-examples, guiding conjecturing. This creates a loop: human intuition proposes, AI proves, new conjectures emerge.

Actionable Takeaways

🌐 The Macro Shift: The era of "good enough" probabilistic AI for critical systems is ending; the market demands provable correctness. Axiom Math's approach signals a return to formal methods, supercharged by AI, addressing the verification bottleneck in software and hardware.
⚡ The Tactical Edge: Investigate formal verification tools for safety-critical code generation, hardware design, and legacy code migration. Prioritize solutions combining AI generation with deterministic proof for speed and certainty.
🎯 The Bottom Line: Formally verifying complex systems with AI will redefine trust in software and hardware. Companies integrating these capabilities gain a competitive advantage, reducing bugs, accelerating development, and meeting regulatory demands over the next 6-12 months.

Podcast Link: Click here to listen

Axim's mission is to build a reasoning engine that is self-improving and that combines generation and verification. For Putinham, for people who aren't like math nerds, is like a contest among math undergraduates that's incredibly hard. The median score is like a zero. We recently Axiom Prew, we got eight out of 12. We use formal languages like lean to ground the natural language counterpart and because we are doing that we witness the power of deterministic tooling and probabilistic system working together.

You're listening to Gradient Descent, a show about making machine learning work in the real world, and I'm your host, Lucas Bewald. Karina Hung is the CEO and founder of Axiom Math, which is a company that calls itself a self-reasoning system and is currently on the cutting edge of building an AI mathematician. They've had astonishingly strong results on math olympiad problems and recently got the highest score on the Putinham test which we talk about. Karina is super interesting. She was a star academic mathematician now turned into a founder. Axiom math came out as stealth having raised $64 million. She talks in detail about how Axiom works, how she think it applies to more than math over time and also how rock and roll or hardcore rock and roll music in China influenced her her thinking around management and staying hungry. I hope you enjoy this interview.

Karina, why don't we start by the most basic question which is what is Axiom and how does it work at a high level?

So, Axiom's mission is to build a reasoning engine that is self-improving and that combines generation and verification which we think is an overlook component in the current AI landscape and we want to start with an AI mathematician because math is a really good testing ground for this sort of self loop. We use formal languages like lean to ground the natural language counterpart and because we are doing that there are a lot of interesting capabilities we can unlock with much higher sample efficiency.

So the general vision of Axiom is there is a prover, a system that can prove theorems and also a conjecturer. So a system that can propose interesting conjectures and they talk to each other and they talk to each other you have this third component called knowledge base. So if I am a prover then I would like to know sort of what are already proven and what I can use. I would also if I'm a conjecturer like to stress test whether this conjecture is reasonable. So perhaps some counter example will help me out by just not suggesting that at all. And the knowledge base is a very very important component you know kind of supporting these two and then auto formalization which is the system that can you know automatically formalize natural language stuff into formal language stuff. It's kind of weaving all three. So that's kind of the basic setup of action.

So can we break it down in the context of you know something simple that everyone you know could understand like I don't know like take like you know if we're going to try to prove like Pythagorean theorem you know for example assuming it's like not in your knowledge database like how would that how' that work practically?

So I think if it's pythagoran theorem which is I think already proven like hopefully I think that millions of times yeah let's just do it but if we do you know like a denovo math problem like we recently actually proved who were got really strong score winning score on punham exam 2020 2026 because they name it it's in December they name it by the next year so you beat them you so we within the exam time limit we got eight out of 12 we are going to announce you know our actual final score but we also announced nine out of 12 because I think the ninth problem came two hours after my tweet which is a little bit awkward so you're like oh wait a second everyone don't don't repost that tweet, you know, repost this one. That's nice.

Nine out of 12 was Nayoto was last year's number one among all human 4,000 human participants. And it's Pundam fellow which is top honor top five every year like almost every year consistently beyond some years where the problem somehow got really easy. This year's going to we we find that result very encouraging. Especially given that the you know final proof release we're going to say kind of exactly how many solved. So I think that's very exciting. We kind of I think witness the power of deterministic tooling and probabilistic system like kind of working together and and the capability I love because of that.

Okay. So sorry back to your question. Maybe we just say first of all so for Putinham for people who aren't like you know math nerds is like a contest among I think math undergraduates that's incredibly hard right I think the the median score is like a zero and it's like the best you know math undergrads are like doing it so it's you know getting any of the problems right is a is a real accomplishment and and beating everyone is kind of amazing right like all the best mathematicians did super well in this so that tell me if if I if that's a good description.

That's right I I personally got a you know much worse score than than than our poorer system. So, it's a it's a very interesting I think.

What did you get? Wait, let's brag a little bit. What did you get?

Well, so I so I got four out of 12. Uh, which, you know, it's it's not that great. I got some top 250 honorable mention only. So, not not necessarily, but and then that's like, you know, ampers final score will potentially triple me. So, and and and I think we should have a beat Karina kind of party, which is which is always fun.

I've noticed using LMS like say like you know GPT or Gemini or whatever directly without your you know your technology they seem you know kind of surprisingly good maybe at sort of like math olympiad kind of question maybe not not the the platinum but you know the scores are like wow you know this is like pretty impressive but then you I think we have a mutual friend Robert Nishihara I think you know anyway he's he's given me a bunch of different uh you kind of brain teaser questions and you know one of them I got with and it was a pretty simple you know kind of fun question I put it in right you know GPT and Gemini and it actually couldn't solve it couldn't do it oh interesting interesting like you know like like you know to me like you know math Olympian is so much harder than the kinds of like you know questions you might get in a Google interview but it seems like maybe for LM they don't have the same yes hierarchy of of difficulty like do you have an intuition for like what's easy and what's hard for and is your system similar like would would you find maybe some things that might feel more trivial to like a junior mathematician hard for your?

I think it's fascinating to kind of think about like you know LM is doing things like math and sometimes in safety critical domains code in some cases like highstake scenarios writing code statistically just doesn't work in a way that you want to make sure that you have provable guarantee of edge cases. This is similar to saying in mass like approve is approve it must be sort of sound proof and and not have any you know like critical flaw. I think that that which is why I think like informal kind of scaling the informal models to mass AI just isn't something that we believe in.

I think that as to certain like different math questions it's very you know easy to make a case for those that require like say you know the very fine grained checking those things I think ling will be very good at for things that rely on like high level intuitions and probably don't have the sort of detailed part I think the reasoner the informal model will probably have a pretty good contribution When it comes to I think computations I think that certain tools are probably helpful like you know the mathematical sector like these plugins.

I think just it it's very interesting to see for example like a simple positivity argument in analysis. We're actually planning to do a put an improved release and commentary so that we're kind of looking at each problem and how the AI solve it and whether that's expected given how a human might think about it. The analysis part you can see like the AI doing a lot of work for something that the human student or you know researcher would just write off. it will like do like chunks of code to rigorously ensure that you know things around convergence is your limit are very carefully handled and that is actually why the initial days of math lib funny enough the undergraduated algebra text are all formalized like rather straightforward quickly and it took them a long time to do the analysis analysis and algebra are usually the two subjects for the undergrad level I actually found the graph that I would like to show.

Okay. So, basically what I'm talking about is this is not a punam problem. This is a this is Eeros problem 481. a Twitter user I think a you know undergraduate math student solved this problem by himself and then very excited about it and post about it and then I think we are trying to kind of see whether our system can solve it as well just as like a you know Twitter daily engagement thing and this is what the system does it kind of break down into different sub goals results or lemas that will it actually start from like this one being not green and then you know until everything is like hit green at the very end every other node is hit green this note will turn green.

Do you like can you turn that into something that's like interpretable by by a human?

Yes, I think you can. I mean in fact the the goal is for each of the node you can click into it and then you will see the link code and because auto informalization which is converting ling back into natural language is significantly easier than the other way auto formalization and I think just because like people have seen a lot of like models have seen a lot of English not a lot of lean so one direct it's like in translation days I think from you know English to another very niche language like there's one direction that's significantly easier. So, I think yes in in in fact I think you can poke into it that that will be something I think very exciting for someone to sort of interact with the system when it's doing the proving work.

Cool. I mean I wanted to ask like one of the things that I noticed with you know code generation is it it seems like LMS today are better at you know generating code that works than generating you know code that feels like easily interpretable by a human. totally makes sense with like the reinforcement learning that surely you know they're they're using and I would think that you might have a similar issue where you know generating like lean that like you know is like proves to be true is probably an easier thing to check than like could a human like understand it and I would think your models would probably generate things in different ways in humans and I feel like already some of the more technical math proofs are really hard to you know to process do you think that I don't know do you think we're entering a world where models might like prove the reman hypothesis but no one understand you know everyone might agree with every step no one can like actually get an intuition for what happened?

So I think it's a it's a fascinating question because if say a human prove my hypothesis in certain cases I remember back then when it jang professor it ja announced he proved a mass result either I think they they kind of very quickly assembled an expert panel and I was at Oxford at the time. So some of the professor I was interfacing with were telling us about how they got caught together to try to examine the proof and because certain mathematicians ride in a way that's very obscure the experts couldn't quite you know immediately know what's going on either. So I think I would start with that.

I do think that for lean proofs which have this sort of verifiability you know feature it's really good good news because if you imagine like suppose GPT one day produce a 10,000line link proof claiming that it has solved an open conjecture especially because it's train you know trained on like train of thought data among other things we really can't be sure so I think the scenario happened in 2024 so December remember 2024 about the 2025 put them. I remember Dan Hendris posted on Twitter about, hey, I feed these problems into I think it was 01 Pro or 03 Pro trivia, I forget. I forget. And then 03 preview and then people were like grading those model outputs widely different grades like no one in the Twitter commentary area was able to say for sure whether that's a correct solution or not. The consensus was something like no it's not correct because for all these problems there is like a significant hole that the model try to handwave.

The the the the power of formal systems you cannot handwave and in fact it's doing a little bit too much of the opposite which is that it's it's almost annoying that it's like rigorously checking down to every fine grain detail like the convergence and limit. like I'm like you really don't need to spend that amount of you know space to to prove the simple positivity lema but but it does and it's actually one of the heaviest node in the proof graph that we produce so I think that's that's interesting so you have that sort of at least peace of mind when it comes to elegance and intuition I do think first of all for a lot of the problems it is very hard to produce humans or sort of just the intellectual uh you know ceiling very hard to produce two proofs So if someone tells me that hey I have two proofs for rimma hypothesis I think that's like a much more unlikely statement than say I have one proof of rim like there's certain problems that you just feel like you know there is bound to be one proof so that would probably mean that what the AI finds you know minus the part where it might be doing redundant stuff like it might be like you know repeating the same thing again and again you can call that ugly but if you kind of take those out the streamlined version of that it could there's a good case to be made that it might be convergent to what a human would possibly find and I think that's extremely exciting.

So kind of treating AI as not some sort of secondass citizen but as just another like you know mathematician collaborator maybe one that you have not heard of maybe it's like similar to the Rammanujin case where Hardy and Littlewood would at Cambridge just discover this self-taught genius from India. I do think that's a very interesting mindset to think about AI you know mathematician and and I also speaking of of here I also found the other like kind of proof tree so I'm I'm sorry for like the thing I hope this can be cut out okay so I'm sharing the host screen again so this is how like the system is proving problem A1 and then there are problems this is like a really complicated one obviously it's proving it this fast. We kind of have like a time acceleration. But and I think this this is like if I don't refresh it, it kind of Yeah. So this is what it's doing. I don't know how I can show easier for like one of the questions just so people can see like what that looks like.

Yeah, I think I can. A3 is like a problem where people can definitely understand. you know it's like a game problem where Alice and Bob you know the standard Alice and Bob plays a game for string of n numbers and then it can only be zero one or two and then in the in the beginning everything is zero that's the start state and then so you can you know at each move either add or subtract one from one digit to create like a new string that has not appeared before if you cannot do that as in like you have run out of all the possible strings then the other player wins and Alice always goes first. So kind of which player has a winning strategy.

So that's like a standard sort of game theory problem in competition math. And I think a lot of the you know students when look at it will probably have about two or three you know very you know widely used results in competition math to to kind of attack this kind of problem. So for example, you're guaranteed I think in a game of no ties, you know, a winning strategy. And then so it's just like which player which is what is asking and then like I think students will try to kind of write down the string and imagine they are Alice or Bob to try to see you know how they would attack the game and try to see if there's any pattern they can find. So this is really the part about like intuition discovery. So you know I I if I'm Allison or Bob like what would I do? And I think people, you know, actually solving the problem will try to figure out, hey, are there certain states that just a lot worse than others. So if I'm, you know, faced with like a string, I will probably think that if I have, you know, zero on one number, then I I cannot subtract, right? I can only add. And then if I have two, I cannot add. So like potentially one you know at a place will give me a lot more flexibility and try to kind of like generalize these sort of really bad states or some really good states into you know determining the sequence of things or you know what the rational move for the the the the the opponent is when I kind of put them in in that situation. So I think these are very interesting sort of like it's very fun problem and there are obviously more tedious problems like if you look at say like A2 and B2 there's a lot of calculus involved because P&M is the hardest undergraduate level exam unlike the IMO being high school level calculus is seen and I think for people who are very proficient with integration I don't think A2 or B2 will pose any significant challenge however for the AI improver I don't think it's challenging at all but it just does take a lot of lines of link code to solve some of the you know take it as a given issues. For some cases we have the you know AI prove very different very different sort of thing as like have a very different solution from what our human mathematician experts expect which is you know the most fun to watch cases.

Well, so let's talk about A3 because that's a great one, right? It's so easy to understand. Like where my brain goes is like, okay, like let me try one. Like it looks like, you know, all you can do is add one. So Alice adds one. Now you get one. Obviously Bob can't subtract one. So Bob adds one. Okay. So player two wins, I guess, right? Is that So is that accurate?

Actually, first of all, sorry. So you say player two person can add one to get one and then Bob goes and adds one and gets two.

Yeah. And then I think I think I think that's that's right. So actually Bob always wins. Yeah. There's a very clean like winning strategy for Bob that can be described like succinctly. So if you once you see it like Bob doesn't need to respond with much care to you know Alice moves at all which is I think what's kind of making this problem slightly easier than I would say a last you know pund problem level. So it's it's only A3. So there are six problems for each session. Usually we'll expect four, five, six to be harder than one, two, three, I would say. And yeah, so basically Bob can just execute that strategy without thinking much. So that makes it also much more trackable for a lean as well. So there's like very little state to track and there's no like complicated branching to reason about.

But I guess like I don't you know like when I I mean it's always dangerous to like introspect your own brain here, right? But like I I think the the human approach here is to like kind of like think about cases kind of like look for like a strategy that makes sense not like you know formal mathematical proof. So is do you think that Axiom is actually like taking a different approach than a human approach or you mean for this for this problem?

Yeah, for this specific A3 problem. Yeah, for this specific A3 problem, I would say that it's kind of you know aligned like how the AI think about this problem and how human think about it. There are other like you know scenarios where we can see the sort of AI taking a different path from from humans because like it knows that it can kind of brute force through things where the human if you are like you know actual task taker and you want to avoid brute force if you know the brute force might not work like if you don't as in like if I'm in you know doing uklidian geometry problem and I have no idea how to solve this and I know that brute force is kind of the complex coordinate method like can definitely get me there. I am probably willing to sink one hour into brute forcing that but not necessarily for some of the non- geometry cases. I think the the human will find a clever sort of one diagram proof even so it's a good segue.

I mean I want to ask you about the you know the applications of this like what are the like what are there like milestones along the way? I mean you know my former you know CRO is is on your board and and one of your investors. I'm kind of curious like how you you know pitched this you know math LM to to?

I think people saw it themselves as well. So this is something that I find really amazing is that when I'm talking to investors, there are real intellectual alignments like with certain funds. I think funds even have internal memo like just marking that this category is it's not even like a category where it's good to battle. It's something that they consider inevitable to to the future. So if you think about like the lag of verification in hardware and software where design teams are one-third or 1/4 of the size of verification teams verification could take as I said three years in the hardware scenario if you think about in software for example before we have a lot of the proumers people who are native programmers using various sorts of coding models like cursor but there's like you know a lot of need for code review and and kind safety critical cases.

I do think that customers enterprise are facing this really you know unfortunate choice between hey I just want this to not be wrong so I might have just a human to eyeball or like manually do it instead of relying on AI to do it but that's also very painful for the reasons we kind of described like the time the resources the inability how unaccessible certain experts are so like I think the difference between a high you know expertise like formal verification expert in chips and like a medium expertise one. If you can somehow you know make it the case that the AI can handle the the the the very sort of notorious cases and so you lower the bar for hiring I think that would also be incredibly valuable. For example, I think for I see that AWS which has like the you know one of the best automated reasoning teams took them I think five years to formalize the memory isolation component of the hypervisor by hand. That's a shocking number. It's not the entire hypervisor, it's one part.

I think also with now kind of more VIP coding and more like agents generating code verifying that the edge cases are covered is going to be also very important. Another I think use case is code migration. So if you have legacy code and you want to make sure that hey this new code I want to upgrade it and I want to you know make sure this new code are completely equivalent to my legacy code. No, it's not doing better, it's not doing worse like because those legacy code are serving important business functions. That's another very interesting use case in the database scenario. If there are like bad actor then sort of you know proving that you know it is it is actually consistent. So there's kind of currently databases are kind of sharted from each other and they are you know inconsistent in that sense because they have this sort of you know problem where there might be bad actor all these can be you know the sort of formal verification can provide a very interesting solution too.

Yeah. Yeah, I mean I guess like you know the domains that like I'm more you know familiar with in software. I feel like the you know the the test cases are not quite as black and white as like with the formal proof and it might be like kind of over constrained or I think that the real challenge would be like you know formalizing you know what it means to be wrong you know clearly enough that you could actually yeah I do I do think it's very interesting in that you know I think for the database case specifically I I think having some sort of you know abstraction that is easy to interface with will be I think a big unlock.

So basically what I think is going on is that there are certain like use cases where this is a dire need like we really really need to verify everything maybe because you know regulation like kind of you know like mandates that maybe because I just can't put that into deployment so currently we are having humans to do it to make sure that like to the best extent possible we are everything soundproof and then there are kind of like good to have cases that it's really really good to have say in the database case and and I think like if we don't have it maybe we can live without u but like obviously I think having it and that's like kind of a weighing of like you know how painful it is to use versus how much like capability like you know it will unlock when I use it. I think all these are like interesting trade-off and then there's like this final group of maybe you know like v coding like a lovable website wouldn't necessarily need formal verification. I I don't think just like you know the AI mathematician we are building cannot write poetry. I think there is like this sort of like three tier thing and our goal is obviously to start from the first tier and then like kind of like eventually take second and I think like by the time we take the second kind of group it will be like a big moment in in just a broader landscape of AI like much beyond mass you think about for example right like if I in in the first case what is an example if there's like a proof of the consistency of the paxos protocol you know like written by say like Leslie Lampert and Then you can like auto formalize that both in lane of the proof and also part of this like formalization the system can generate say code in a strongly typed language like Rust for example to execute this protocol. This sort of unifying of generation at like as in like what is the implementation and the verification in one is going to be a powerful idea.

And then you know one question you might ask is like how much can I kind of migrate that from not migrate necessarily but extend that from Ross to some weekly type languages and kind of trying to slowly close the gap knowing that you know it's not possible to formally verify all code in in Python like but you can do a majority of it and you know which programming language will be people's choice is also by itself an uncertain question. So I think I think kind of like and then talking about the you know the the sort of you know trip verification use cases. I think that like in 1980s for example a lot of trip companies tried to do for more verification with humans who are very good at writing they're improving languages right in in like much more ancient counterpart of ling and then you know back then it didn't quite scale because obviously I think the you know humans are a bit you know that's a bottleneck factor perhaps like AI form of verification can have a huge comeback here at Axiom starting with an AI mathematician.

Yeah, it's funny. I mean, it feels very oldfashioned in a way, right? Like I remember, you know, in like the 90s there was a lot of enthusiasm around the sort of like formal, you know, proving an AI and you don't hear about it much, you know, the last few decades. It's kind of interesting. I imag pulling things, you know, from the 80s and 90s in terms of this like searching over theorems and trying to trying to build proofs.

I I think there are a lot more like modern research as well on the I think specifically the timing of like reasonling models code generation ability having great you know improvement applying RL in a verifiable domain and the you know ling becoming a widely celebrated language with more developers than ever excited about it mathematicians willing to embrace it. I think all these three things coming together make now a really good timing. in the sort of software and hardware verification landscape in software for example there's CS lib that is under development you know I think Stanford's automated reasoning lab led by professor Clark Barrett is doing it supported by industry partners such as deep mind and AWS they're trying to formalize all the CS literature undergrad in lean there are people trying to kind of formalize machine learning things in lean it's a very sort of interesting moment I think that we are living And kind of going back to the database I do I do I'm curious about your thought right like for example I think all databases have to solve the bying general problem like consistency of those solutions need to be formally proven don't know if like something like that being is that of interest or can people live with that I'm actually quite curious about that myself.

Well I always wonder I mean I feel like um you know I'm also not a database expert but it does does make me think like I mean I feel you know there was a time when you know the big application of databases was more like you know financial services and you know people really focused on you know um you know making sure that like things were completely consistent and you could you know have like you know like real invariance and then I think what I've seen in my life is this sort of explosion of interest in you know no SQL and databases sort of like relax these consistency you know constraints because you know they they wanted to operate more efficiently they wanted to like you know save all user data and be able to query it. There are kind of these other you know kind of competing things that I think are sort of different than um you know it seems like the the the majority of applications for databases maybe didn't need the level of um you know formal consistency. I don't know if that I wonder how that sort of I feel there always going to be applications where you know you want to be like incredibly careful but that does seem like there there always are kind of trade-offs you know that that in certain applications you.

I agree I agree I agree I think the the interesting I think like responsive to that is that like you don't uh almost like you don't have to give up the you know efficiency because of the strong reason component in the system so it will be a very different I think So again the reason improver framework I think for example the older alpha proof system which by the way won the that to me was the IMO moment to be honest with you the 28 score of silver medal with 29 being the gold cutoff but they have got every single question about the comics one which is the same as the 2025 IMO where you know a lot of the companies got 35 like missing the one comar the difference between the two years is 2024 there are two common toxics and 205 there's one like in alpha proof system just a lot of you know it's a it's a formal system that might for example we see that trade-off if you have only an informal system it cannot formally proof I think combining these two is going to be just gamechanging for even in the sort of like you know customer perspective because of the natural language part you don't actually need to painstakingly you know, as a customer, spell out exactly what you want.

I think part of the goals of the capability of auto formalization is that when facing ambiguity, what does the system do like a it will be it will be not very fun if you know just like I really like deep research. I use that a lot. it it will or or like I mean even like some agents like menace they will ask me kind of for my you know preference for this or that is active that that's good I think that's interactive feedback but you know once that list gets to like 500 then I will like obviously lose uh patience you know what to do when facing ambiguity I think is a very important part for autoformmalization and again autoformmalization most people think about it as a data gener generation you know uh a data generation way but for us it's a core technology. We think of auto formalization as harder potentially than proving similar problem to proving it's a core technology and they can't be almost separated. And also for us auto formalization the difficulty is always the statement because if you can like you know if you want to auto formalize a proof because you have the lane checker you know signal that that's fine. or the formalizing the statement correctly is going to be challenging because you know by definition you don't have the solution to prove yet. It's very good in the code verification setting to try to auto formalize because you have the test cases of input output pairs as a proxy to ground your formal spec and I think this sort of like spec based programming I think similar product like Carol from AWS like we think that's just very valuable and visionary thinking.

I mean, I'm a little bit out of my depth here, but isn't there like a like a kind of halting problem here that sort of limits like where this can possibly go? I mean, just because you have cases doesn't mean you'll know.

You cannot you cannot verify all code in Python like that's or like you know that's that's not possible but you can do for majority of it. You can do like things that can make people's life easier just with the uh with the um you know capability you have. I think I have sort of you know uh before I think it's about like you know a couple like 10 minutes ago I think I I said that was an important qualification yeah you cannot do all code but you can do quite a lot.

Cool. Are you already working on um you know commercialization of your technology or are you still in like the research phase?

So we are we're actually a I think after the holiday we're like a six-month old company now. We thought that the R&D stage will take a long time and I still today think we are only at day zero. There's so much to do but that doesn't mean you know I think we are in active conversations with some trusted partners about you know what this technology could solve for them. And I think through these conversations, first of all, it's just very intellectually expanding and and it's really great to feel like similar with like I think my C round fundraising process is that there's like real intellectual alignment with the people you're talking to and and everyone feels a bit like a it's almost like a secret keeper. It's like I feel like we see something that the world doesn't see yet. And that that feeling I think when you are talking to the right people is just exhilarating. We are doing you know obviously some early conversations on on that on our front.

What do you think like um what do you really predict for like mathematics going forward? I mean of course it's sort of nice to say that hey this is like a helpful you know friend and and collaborator but of course this is going to get you know better and better. I imagine like you know and some amount of time you know all new proofs will be you know generated in this matter manner. Do you um I don't know do you think academic mathematics continues or or?

I think mathematics will continue. I think mathematicians will learn to work on a different abstraction than they are used to before. I think the change of abstraction is going to be having less resistance than people think it will have. Let's let's take take an example like I think back then before latex you'll have a typewriter right your like like kind of like type set your math paper like the secretary I think I I'm not I'm not sure I don't know but after latex kind of came out mathematicians like welcomed it used it this can be a really good tool and I think for like also by the way I think having kind of computer help with some part of the mathematics research It's not new. I mean computer algebra systems have been you know aiding a lot of the more computational work.

We we find that probably this will free mathematicians who have already developed very good intuitions the like manual labor needed to painstakingly check every technical lema. And so if you're someone like you know Terrence Tower really great mathematician just

She Raised $64M to Build an AI Math Prodigy | Carina Hong, CEO of Axiom

How AI Formal Verification Solves Unsolvable Problems and Redefines Software Assurance

Top 3 Ideas

🏗️ The Verification Imperative Axiom's mission is to build a reasoning engine that is self-improving and that combines generation and verification.

🏗️ Beyond Human Intuition The power of formal systems: you cannot handwave.

🏗️ Mathematician as Conductor Mathematicians will learn to work on a different abstraction... have the AI mathematician be the diligent grad student.

Actionable Takeaways

Others You May Like

Goodfire AI’s Bet: Interpretability as the Next Frontier of Model Design — Myra Deng & Mark Bissell

Mustafa Suleyman — Nature, humans, tools… and now a fourth class

Introducing 4D Creation Open Beta and the Future of Gaming with Roblox CEO Dave Baszucki

She Raised $64M to Build an AI Math Prodigy | Carina Hong, CEO of Axiom

How AI Formal Verification Solves Unsolvable Problems and Redefines Software Assurance

Top 3 Ideas

🏗️ The Verification Imperative *Axiom's mission is to build a reasoning engine that is self-improving and that combines generation and verification.*

🏗️ Beyond Human Intuition *The power of formal systems: you cannot handwave.*

🏗️ Mathematician as Conductor *Mathematicians will learn to work on a different abstraction... have the AI mathematician be the diligent grad student.*

Actionable Takeaways

Join 10,000+ smart readers on our AI newsletter and stay ahead of the curve

Others You May Like

Goodfire AI’s Bet: Interpretability as the Next Frontier of Model Design — Myra Deng & Mark Bissell

Mustafa Suleyman — Nature, humans, tools… and now a fourth class

Introducing 4D Creation Open Beta and the Future of Gaming with Roblox CEO Dave Baszucki

🏗️ The Verification Imperative Axiom's mission is to build a reasoning engine that is self-improving and that combines generation and verification.

🏗️ Beyond Human Intuition The power of formal systems: you cannot handwave.

🏗️ Mathematician as Conductor Mathematicians will learn to work on a different abstraction... have the AI mathematician be the diligent grad student.