Epoch AI
October 9, 2025

Why frontier AI can't solve this professor's math problem - Greta Panova

In this episode, USC Mathematics Professor Greta Panova breaks down why she designed a math problem that stumps today's most advanced AI. She offers a clear-eyed look at what AI can and cannot do in theoretical math, and what its evolution means for the future of the field and humanity itself.

The Unsolvable Problem

  • "I can come up with something hard that's basically only I know how to do... even if you tell it basically what to do, what papers to look at, it just has no idea how to proceed."
  • "It's not an elementary problem. It involves theoretical constructions that are graduate material... it's much harder than the hardest problems on the Putnam [exam]."

Professor Panova crafted a problem for the Frontier Math Symposium that is intentionally non-canonical, requiring multiple non-trivial steps and obscure theoretical ideas not found in mainstream textbooks. While she initially forecasted an AI could solve it in 1-2 years, she now believes it will take longer due to their poor ability to extrapolate from limited data. Unlike a standard exam question, this problem tests for genuine creative synthesis, a skill frontier models currently lack.

The Ambitious But Unprepared Student

  • "The logic is not properly built in yet. It's doing logical errors... it's just piecing together things that seem to look right."
  • "One really unprepared but ambitious student comes and says, 'I want an A. Look here. I did this.' And I say, 'That's not quite correct.' But he says, 'Can I have some more points?'... I wish it wasn't doing this."

Panova describes current AI as a student who is great at mechanical tasks—generating code, summarizing papers, and running computations—but fails at rigorous logic. The models are adept at piecing together arguments from existing literature but can't spot gaps in their own reasoning or verify proofs. This leads them to produce answers that sound convincing but are fundamentally flawed, much like an ambitious student trying to bluff their way to a good grade.

The Future of Mathematics

  • "The human part in the process is going to be the selection or the editorial part where we figure out what's valuable, what's not, in what direction to go, which problems to solve."
  • "If the AI actually can do math on the level of a math professor, then it will be able to do anything else that is just intellectual work... and then the whole of humanity has a problem."

Panova predicts AI will reshape math on a scale of 8 out of 10. In the short term, it will serve as a powerful assistant, automating tedious tasks. However, the long-term role of mathematicians will evolve into an "editorial" function: curating knowledge, defining important problems, and guiding the field's direction. She warns that if AI ever masters high-level math, it will signal an ability to replace all intellectual labor, posing a profound societal challenge.

Key Takeaways:

  • Mathematicians must actively steer AI's development in their field. Without expert involvement in creating benchmarks and evaluating results, decision-makers could misinterpret AI's progress (e.g., finding a correct numerical answer without a valid proof) and prematurely declare "math is solved," which would be catastrophic for research and funding.
  • AI Is a Pattern-Matcher, Not a Logician. Current models excel at synthesizing existing knowledge but fail at the novel, multi-step creative reasoning required for frontier mathematics. They lack the fundamental logic to build sound proofs from scratch.
  • The Mathematician Becomes the Editor. As AI automates computation and literature reviews, the primary human role will shift to strategic oversight: identifying valuable problems, validating AI-generated work, and setting the research agenda for the entire field.
  • Benchmark or Be Disrupted. The math community must lead the charge in creating and assessing rigorous AI benchmarks. Failure to do so risks letting non-experts define success, potentially devaluing the discipline based on superficial AI achievements.

For further insights and detailed discussions, watch the full episode: Link

This episode reveals why frontier AI models are still far from mastering true mathematical reasoning, offering a crucial reality check on their limitations and the enduring value of human intellect in complex problem-solving.

Crafting an AI-Proof Math Problem

  • Greta Panova, a mathematics professor at USC specializing in algebraic combinatorics and computational complexity theory, details her process for creating a problem for the Frontier Math Symposium. Her goal was to design a question that was not guessable and required deep, non-mainstream mathematical insight.
  • Panova explains that the problem she developed was based on her own PhD research, involving multiple non-trivial steps that are not found in standard textbooks or literature.
  • She intentionally selected a problem that current AI systems, even when given direct hints and relevant papers, were completely unable to solve.
  • Panova confirms the problem is significantly harder than those on the Putnam Mathematical Competition, a prestigious and notoriously difficult undergraduate math contest, as it involves graduate-level theoretical constructions.

Assessing Current AI: Strengths and Critical Flaws

  • Panova provides a grounded assessment of current frontier AI models, highlighting a stark contrast between their data-retrieval capabilities and their profound lack of genuine logical reasoning.
  • Strengths: Models excel at tasks that leverage their vast database, such as searching for information, mixing and matching arguments from existing literature, running computations, and generating Python code.
  • Weaknesses: The models' core weakness is their inability to construct a logically sound argument. They piece together text that looks correct but often contains fundamental errors. Panova notes, "It would reverse the arrow [of an inequality] the direction at some point arbitrarily."
  • She also points out that models like ChatGPT often invent fake references, a critical flaw for research applications. While models with search capabilities (like Copilot) are better at finding real papers, their output must still be treated with extreme caution.

AI as a Flawed Student: The Missing Logic Layer

  • When asked for advice she would give an AI model as if it were her student, Panova explains that the analogy breaks down due to the AI's fundamental architecture.
  • The primary missing component is a connection to a proof verification system—a formal method for checking the correctness of each logical step. This is a concept familiar to crypto researchers working with formal verification for smart contracts.
  • Unlike a graduate student who can identify gaps in a proof, AI models currently lack this self-awareness and will confidently present flawed or incomplete arguments.
  • Strategic Implication: This highlights a major opportunity for Crypto AI. Integrating technologies like zkML (Zero-Knowledge Machine Learning)—which allows for the verifiable computation of AI models without revealing the model itself—could provide the "proof verification" layer that current models are missing, creating a pathway to more reliable and trustworthy AI systems.

Practical AI Applications in Mathematical Research

  • Despite their limitations in novel reasoning, Panova acknowledges that AI models are useful tools for automating mechanical and time-consuming research tasks.
  • She uses AI to generate code, test hypotheses by computing initial values, and summarize algorithms from academic papers.
  • However, she stresses the need for constant vigilance, as the output can be subtly wrong. "It sounds very convincing. It gives you something that looks like an answer and somewhere in the middle of it something might be switched and it make the whole thing completely wrong."
  • Actionable Insight: For researchers, AI is best viewed as a powerful but unreliable assistant. It can accelerate the exploratory phase of research but cannot be trusted for final validation or logical proofs without rigorous human oversight.

The Illusion of Creativity: AI's Search vs. Human Intuition

  • Panova explores the nature of AI's apparent creativity, suggesting it is more a function of massive-scale pattern matching than a process analogous to human intuition.
  • She recounts a positive surprise where ChatGPT correctly executed a complex algorithm from one of her papers to generate a valid example, something that had frustrated her and her colleagues.
  • However, she contrasts this with AI's tendency to produce answers that are correct in conclusion but based on entirely wrong reasoning, reminiscent of the mathematician Ramanujan's famously intuitive but unproven formulas.
  • Panova theorizes that what we perceive as "creative leaps" in AI is likely the result of a hidden, massive search across its training data, connecting disparate concepts in a way humans cannot easily replicate. We don't see the vast search process, only the surprising result.

The Future of Mathematics: Coexistence or Obsolescence?

  • Looking ahead, Panova offers a nuanced and cautionary perspective on how advanced AI will reshape the field of mathematics and intellectual work in general.
  • Short-Term: AI will act as a powerful assistant, handling simple proofs, calculations, and examples, thereby accelerating research. However, this will also likely lead to a flood of low-quality, AI-generated papers, straining the peer-review system.
  • Long-Term: The role of the mathematician will shift towards "editorial" functions—selecting valuable problems, guiding research direction, and interpreting results. The human ability to maintain a "big picture" will remain critical.
  • She makes a stark claim: if AI can perform math at the level of a professor, it can perform almost any other intellectual job, leading to massive societal disruption. She rates AI's potential to reshape math an 8/10 and the world a 9/10 (where 10 is equivalent to the industrial revolution).

A Call to Action: The Role of Mathematicians in AI Benchmarking

  • Panova emphasizes the critical importance of expert-led benchmarking to prevent a dangerous overestimation of AI's capabilities.
  • Efforts like the Frontier Math Symposium are essential for accurately measuring what AI can and cannot do.
  • A major risk is that current benchmarks often rely on getting a single, complex numerical answer. An AI could arrive at the correct number through flawed reasoning, leading to false conclusions about its abilities.
  • Strategic Warning: She warns against a scenario where misinterpreted benchmark results lead decision-makers to declare mathematics "solved" by AI, potentially defunding the entire field. "We don't want to get to a point where... somebody higher up says oh well then math is done... That would be horrible."

Final Verdict and a Word of Caution

  • Panova concludes by directly refuting recent media hype suggesting that AI is already performing at the level of a good PhD student, a claim she calls "wrong."
  • She clarifies that while many mathematicians are excited, the community consensus is that current models are not nearly as capable as skilled human researchers.
  • Her forecast for her own problem being solved is about 1.5 years, but only if significant effort is made by mathematicians to guide the AI's training—it won't happen simply by scaling existing large language models.
  • Investor Takeaway: The narrative that AI is on the verge of solving all of mathematics is dangerously oversimplified. Progress requires deep, domain-specific human expertise to guide training, evaluation, and the development of new architectures that incorporate logical verification.

This discussion underscores that AI's path to true mathematical intelligence is blocked by fundamental reasoning gaps, not just a lack of data. For investors and researchers, the key opportunity lies in developing systems that integrate formal proof verification, creating a foundation for more reliable and strategically valuable AI.

Others You May Like