Machine Learning Street Talk
July 4, 2025

AI doesn't work the way you think it does

This episode, featuring insights from AI researcher Kenneth Stanley, author of "Why Greatness Cannot Be Planned," explores a provocative idea: today's AI is an impostor. While its performance is dazzling on the surface, its internal architecture is a chaotic mess, a fundamental flaw that could stall the pursuit of true artificial intelligence.

The AI Impostor Problem

  • "If you just look at the output, it's great. It looks exactly like a skull, but underneath the hood, it's not capturing any of the underlying components or the regularity. In some sense, it's not really a skull. It's an impostor."
  • The dominant method for training AI, Stochastic Gradient Descent (SGD), creates what the episode calls "fractured, entangled representations"—or more bluntly, "total spaghetti." While these models can ace benchmarks and generate convincing outputs, they lack a deep, structured understanding of the world. They are masters of elaborate memorization, not genuine comprehension, much like a student who memorizes formulas for a test but can't derive them from first principles.

A Tale of Two Architectures

  • "With conventional SGD... you get a garbage representation. Just total spaghetti. The way these new networks learn is completely different. The representations they create are beautiful... a unified factored representation."
  • A groundbreaking paper reveals an alternative path. In systems like Picbreeder, which use open-ended, bottom-up learning, the AI builds "unified, factored" models of the world.
    • Unified Models: These networks build clean, modular, and shockingly intuitive representations. Changing a single parameter might open a mouth or wink an eye, showing a deep understanding of concepts.
    • SGD Models: In conventional networks, tweaking a parameter produces meaningless, chaotic distortions. This is the difference between a real castle and a "sand castle"—one has structural integrity, the other just looks the part.

The Power of Not Looking

  • "The stepping stones that lead to these interesting artifacts that you might want to find don't resemble them... sometimes the only way to find something is by not looking for it."
  • The secret to building better AI lies in abandoning fixed objectives and embracing a concept called "deception." The path to a breakthrough often doesn't look like the goal itself. Searching directly for a skull image in Picbreeder failed, but users who explored "interesting" symmetrical patterns stumbled upon it serendipitously. This open-ended search implicitly selects for more "evolvable" and robust foundations, rewarding discovery over narrow optimization.

Key Takeaways:

  • The podcast argues that our current approach of scaling up "impostor" AIs will hit a wall, becoming insanely expensive while failing to produce true creativity or continual learning. The blind pursuit of benchmarks may be blocking us from discovering real intelligence.
  • Today's AI is a Brilliant Impostor. It excels at mimicry but its internal "spaghetti" wiring reveals a lack of deep, structural understanding, limiting its potential for genuine creativity.
  • The Objective is the Obstacle. Directly optimizing for specific goals, the core of modern AI training, is a deceptive trap. True innovation comes from open-ended exploration where the destination is unknown.
  • Diversify the AI Portfolio. The industry's singular focus on scaling massive, objective-driven models is a high-risk bet. Investing in alternative, bottom-up paradigms is crucial for discovering more robust and truly intelligent systems.

For further insights and detailed discussions, watch the full podcast: Link

This episode reveals why today's AI models are brilliant 'impostors,' acing benchmarks while lacking true understanding, and explores an alternative path to genuine machine intelligence.

The "Impostor" Problem: AI's Brilliant Facade

  • The podcast opens by challenging the prevailing optimism around AI. While models can produce breathtaking results, their internal structures are a chaotic mess. The dominant training method, Stochastic Gradient Descent (SGD)—an iterative, brute-force optimization process that adjusts a model's parameters to minimize error—creates what the speakers call "garbage representations."
  • The internal wiring of these models is described as "total spaghetti," lacking any intuitive or logical structure.
  • This raises a critical question for the entire field: if the underlying mechanics are fundamentally flawed, how can the outputs be so convincing?
  • The answer proposed is that the AI has learned to be an "impostor." It perfectly mimics the desired output, like a sandcastle resembling a real castle, but lacks the structural integrity and true components of the real thing.
  • Kenneth Stanley, a key researcher cited, frames this starkly: "With conventional SGD... you get a completely different kind of garbage representation. Just total spaghetti."

Fractured Learning vs. Deep Understanding

  • The discussion draws a sharp distinction between two modes of learning: rote memorization versus foundational understanding. The "impostor" AI excels at the former, creating what the paper calls a fractured, entangled representation, where related concepts are disconnected and independent behaviors become intertwined.
  • The host provides a powerful analogy from his high school physics experience: one class required memorizing endless specific equations (fractured knowledge), while the other used calculus to derive solutions from first principles (unified understanding).
  • This difference is critical for future potential. Like two mathematicians who both ace an exam, one might go on to make groundbreaking discoveries while the other discovers nothing. Today's LLMs are compared to the second mathematician—excellent at benchmarks but lacking the deep, structured knowledge required for true, out-of-distribution creativity.

An Alternative Paradigm: The Picbreeder Experiment

  • The conversation pivots to an alternative approach, rooted in an experiment by Kenneth Stanley called Picbreeder. This system allowed users to collaboratively "breed" images through an evolutionary process, leading to a profound insight.
  • The experiment revealed that users who directly searched for a specific image (e.g., a car or a butterfly) almost always failed.
  • In contrast, users who explored without a specific goal, simply selecting images they found "interesting," discovered a stunning variety of complex and beautiful forms.
  • This phenomenon is caused by deception, a key concept meaning the stepping stones to a valuable discovery often bear no resemblance to the final outcome. Searching directly for the goal leads you away from the winding, counterintuitive path where the solution actually lies.

The Power of Unified, Factored Representations

  • Unlike SGD, the open-ended, evolutionary approach from Picbreeder creates what the paper calls a unified, factored representation. These internal models are clean, modular, and shockingly intuitive, representing a true, abstract understanding of the data.
  • For investors and researchers, this is a critical distinction. A factored representation means the model has independently identified and encoded meaningful features of an object.
  • For example, in a network that learned to generate a skull, one parameter might control the mouth opening and closing, while another makes it smile. This demonstrates a deep, component-level understanding.
  • In conventional networks, adjusting a single parameter produces meaningless, chaotic noise. This is the impostor at work—it can show you a skull, but it doesn't know what a skull is.

Deception, Serendipity, and the Evolution of Evolvability

  • The episode explains how these superior representations emerge. The process is not random but guided by human intuition for what is "interesting," which locks in useful foundational concepts over time.
  • On the path to discovering a skull image in Picbreeder, users selected a symmetrical ancestor not because it looked like a skull, but because symmetry was an interesting property. This "locked in" symmetry as a building block for all future generations.
  • This process creates an elegant hierarchy of features, building complex understanding from simple, interesting blocks.
  • Arash, the paper's co-author, introduces a critical principle: the evolution of evolvability. When choosing between a "spaghetti" representation and a modular one, users and evolutionary pressure will implicitly favor the modular one because it offers more potential for interesting future discoveries.
  • Arash notes: "If there's like two versions of the skull, which is one is like spaghetti and one is like very modular and composable... the one that's more evolvable will be the one that wins out, right?"

Strategic Implications for LLMs and Crypto AI

  • This research has profound implications for the current trajectory of AI, particularly for Large Language Models (LLMs). If LLMs are impostors, their ability to generalize, be creative, and learn continually is fundamentally limited.
  • For Investors: The current strategy of simply scaling larger models with more data and compute may hit a wall of diminishing or even negative returns. The immense cost of training may be a symptom of building on a flawed, "impostor" foundation.
  • For Researchers: The focus on benchmark performance may be blinding the field to these underlying structural flaws. An LLM can appear human-level on in-distribution tasks but may be incapable of the creative, out-of-distribution reasoning needed for AGI or novel scientific discovery.
  • The speaker warns this could lead to an unsustainable future: "If it's an impostor underneath the hood then these kinds of things are going to hit a wall or become insanely expensive."

Conclusion: A Call for Diversified AI Investment

  • The episode argues that the blind pursuit of objective-based optimization is creating brittle, costly, and ultimately limited AI. Investors and researchers must recognize this risk and diversify their strategies by exploring open-ended, exploratory systems that can build robust, truly intelligent models from the ground up.

Others You May Like