a16z
October 13, 2025

Columbia CS Professor: Why LLMs Can’t Discover New Science

Columbia University Professor Vishal Misra brings an information-theoretic lens from his networking background to cut through the AGI hype, offering a formal model for what Large Language Models can—and fundamentally cannot—do. His work provides a rigorous framework for understanding that LLMs are powerful navigators of existing knowledge, but are architecturally incapable of true scientific discovery.

The Bayesian Manifold: How LLMs Really Reason

  • "It sort of reduces the world into these Bayesian manifolds. As long as the LLM is traversing through these manifolds, it is confident... The moment it veers away... it starts hallucinating."
  • LLMs don’t just parrot text; they build a compressed map of their training data, reducing a complex world into lower-dimensional geometric structures called "Bayesian manifolds." Reasoning is the act of confidently moving along the paths of these learned manifolds.
  • This model elegantly explains in-context learning. When you provide examples in a prompt, you’re not "teaching" the model in real-time. Instead, you're providing evidence that helps it compute a Bayesian posterior, effectively guiding it onto the correct manifold to solve the problem.

The Entropy Engine

  • "When you add more context, you make the prompt information-rich, [and] the prediction entropy reduces."
  • "That's why Chain of Thought works. It starts breaking the problem into small steps... and once it breaks it down, then it's confident."
  • LLM behavior is governed by entropy. High-information prompts (specific, rare phrases) combined with low prediction entropy (a clear, predictable next step) create confidence and reduce hallucinations. A vague prompt leads to high prediction entropy, where the model has too many paths and is likely to produce nonsense.
  • Chain-of-Thought is a practical application of this principle. It forces the LLM to break a problem into a sequence of small, algorithmic steps, each with very low prediction entropy, ensuring it stays on a reliable path to the correct answer.

The AGI Litmus Test: Navigators vs. Creators

  • "Any LLM that was trained on pre-1915 physics would never have come up with a theory of relativity. Einstein had to reject Newtonian physics... He completely rewrote the rules."
  • "Right now, these models navigate, they do not create. AGI will create new manifolds."
  • Today's LLMs are navigators, operating within the "inductive closure" of their training data. They can synthesize, connect, and fill in the gaps within the known universe of information, but they cannot step outside of it.
  • True AGI, by Misra’s definition, will be a creator of new manifolds. It won't just solve problems using existing math; it will invent new branches of mathematics, requiring it to reject the very axioms it was trained on—a feat current architectures cannot achieve.

Key Takeaways:

  • LLMs are Navigators, Not Discoverers. They are masters of interpolation within their training data but are architecturally bound from making the intuitive leaps required for true scientific breakthroughs. Don’t expect a Transformer to produce the next theory of relativity.
  • The Innovation Plateau is Real. Simply throwing more data and compute at current architectures will only "smoothen out" existing knowledge manifolds, not create new ones. This path leads to incremental gains, like an iPhone getting a better camera, not a paradigm shift.
  • Entropy is the Key to Control. For developers, effective prompting is entropy management. By crafting specific, context-rich prompts, you reduce the model's prediction entropy, forcing it onto a confident, low-hallucination path to a reliable output.

For further insights, watch the discussion here: Link

This episode reveals the fundamental architectural limits of current LLMs—they are powerful navigators of existing knowledge, not creators of new science, a critical distinction for anyone investing in the future of AI.

The Information Theory Lens on LLMs

  • Martin Casado introduces Vishal Misra, a Columbia University Computer Science professor, highlighting their shared background in networking. Martin praises Vishal's work for providing the most predictive and formal models for understanding how LLMs operate. Unlike the prevailing discourse, which often swings between hype ("AGI is here") and oversimplification ("they're just stochastic parrots"), Vishal applies principles from information theory to create a structured framework for analyzing LLM reasoning.
  • Martin explains his interpretation of Vishal's core idea: LLMs reduce the complex, multi-dimensional universe of information into a lower-dimensional geometric structure called a Bayesian Manifold. This manifold represents a high-confidence state space where the model can reason effectively.
    • Bayesian Manifold: A conceptual geometric space representing the knowledge an LLM has learned. When reasoning, the LLM moves along paths within this manifold where it has high confidence; straying from it leads to hallucinations.
    • Martin notes, "We take this very complex heavy-tailed stochastic universe and we reduce it to kind of this geometric manifold and then when we reason we just move along that manifold."

Entropy, Prompts, and Predictability

  • Vishal confirms Martin's summary and elaborates on the mechanics. At their core, LLMs generate a probability distribution for the next token (word or part of a word). The key to understanding their behavior lies in the entropy of this distribution.
    • Information Entropy (Shannon Entropy): A measure of uncertainty in a probability distribution. Low entropy means the next token is highly predictable, while high entropy means many tokens are plausible.
    • LLMs perform best with prompts that are high in information (specific, rare) but lead to low prediction entropy (a clear, predictable next step).
    • For example, "I'm going out for dinner" is a low-information prompt leading to high prediction entropy (many possibilities).
    • In contrast, "I'm going to dinner with Martin Casado" is a high-information prompt that reduces prediction entropy, as the LLM can infer a narrower set of likely restaurant types.

The Mechanics of Reasoning and Chain of Thought

  • Vishal explains that this entropy-reduction mechanism is why techniques like Chain of Thought are effective.
    • Chain of Thought: A prompting method where the LLM is asked to break down a problem into smaller, sequential steps before providing a final answer.
    • When asked a complex question like "What is 769 * 1025?", the LLM's initial prediction entropy for the answer is high and diffuse.
    • By invoking an algorithmic, step-by-step process (like long multiplication), the prediction entropy at each stage becomes very low. The model knows exactly what to do next.
    • Vishal states this process allows the LLM to "arrive at an answer which you're confident of and which is correct." This demonstrates that LLMs excel when they can follow learned, structured procedures within their known manifold.

From Cricket Statistics to Accidental Innovation

  • Vishal shares the origin story of his work on LLMs, which began with a personal project to fix the cumbersome user interface of Cricinfo's statistics database, StatsGuru. The goal was to replace a complex web form with a natural language query system.
    • In July 2020, using the early GPT-3 API, he found the model couldn't handle the database's complexity directly due to its small 2,048-token context window.
    • To solve this, he created a system where a user's query would retrieve similar example queries from a database and feed them into the GPT-3 prompt. This provided the necessary context for the model to generate the correct structured query.
    • This method, which he built 15 months before ChatGPT's release, was an accidental invention of what is now known as RAG (Retrieval-Augmented Generation).
    • RAG: An AI framework that retrieves data from an external knowledge base to ground LLM responses in factual, specific information, reducing hallucinations.

The Plateauing Pace of LLM Advancement

  • Reflecting on the evolution since GPT-3, Vishal expresses surprise at the rapid pace of development but now sees signs of a plateau. He compares the current state of LLMs to the iPhone—after initial revolutionary leaps, recent iterations offer only incremental improvements, like a slightly better camera.
    • He observes that across models from OpenAI, Anthropic, and Google, "the capabilities of LLMs has not fundamentally changed. They've become better, right? They've improved but they have not crossed into a different realm."
    • Strategic Implication: For investors, this suggests that simply scaling current transformer architectures with more data and compute may yield diminishing returns. The next major breakthrough will likely require a new architectural paradigm.

A Formal Model: The Matrix Abstraction

  • Martin praises Vishal for developing a formal model to analyze LLMs while others were focused on rhetoric. Vishal outlines his "matrix abstraction" to explain their inner workings.
    • Imagine a massive matrix where each row is a possible prompt and each column is a token in the LLM's vocabulary. Each cell contains the probability of that token following that prompt.
    • This matrix is astronomically large—more rows than atoms in the known universe—and thus cannot be stored directly. It is also extremely sparse, as most token sequences are nonsensical.
    • LLMs create a compressed, interpolated representation of this matrix based on their training data. When given a new prompt, they use it as evidence to compute a Bayesian posterior distribution for the next token.
    • This model elegantly explains in-context learning (or few-shot learning), where providing examples in a prompt acts as new evidence that allows the LLM to update its predictions and learn a new task on the fly without retraining.

Why Recursive Self-Improvement is Unlikely

  • Using this framework, Vishal argues against the possibility of recursive self-improvement with current architectures. The output of an LLM is the inductive closure of its training data—it can only generate conclusions that can be logically derived from what it has already seen.
    • Inductive Closure: The complete set of knowledge that can be inferred from an initial set of data. For LLMs, this means their outputs are fundamentally bound by their training data.
    • Even with multiple LLMs interacting, they cannot generate truly new information because they are all operating within the same closed system defined by their initial training.
    • Vishal gives a powerful example: "Any LLM that was trained on pre-1915 physics would never have come up with a theory of relativity. Einstein had to sort of reject the Newtonian physics and come up with this space-time continuum."

Defining AGI: Creating New Manifolds

  • Vishal defines AGI not as a system that is merely intelligent but as one that can create new knowledge and scientific paradigms.
    • Current LLMs are navigators; they are exceptionally good at exploring and connecting dots within the existing manifolds of human knowledge they were trained on.
    • True AGI will be a creator; it will have the ability to generate entirely new manifolds—discovering new axioms, new branches of mathematics, or new laws of physics.
    • He argues that achieving this requires a new architecture, as simply adding more data or compute to current models will only "smoothen out the already existing manifolds."

The Path Forward: Beyond Current Architectures

  • Vishal believes the next leap requires a new architecture that sits on top of or replaces LLMs. He points to several promising, albeit early, research directions:
    • Simulation: Developing models that can run approximate mental simulations to test ideas, much like a human catching a ball, rather than relying solely on language processing.
    • Energy-Based Models: Architectures like Yann LeCun's JEPA (Joint Embedding Predictive Architecture), which aim to learn more abstract representations of the world.
    • Analyzing Failures: Studying why LLMs fail on abstract reasoning benchmarks (like the ARC Prize) to reverse-engineer the requirements for a more capable architecture.
    • Actionable Insight: Researchers and investors should prioritize and explore these alternative architectures, as they represent the most likely path to overcoming the inherent limitations of today's LLMs.

Conclusion

  • This discussion provides a formal framework for understanding LLM limitations: they are powerful Bayesian reasoners confined to the knowledge manifolds of their training data. For investors and researchers, the key takeaway is that true AGI requires a new architecture capable of creating knowledge, not just navigating it.

Others You May Like