a16z
June 4, 2025

How Fei-Fei Li Is Rebuilding AI for the Real World

This podcast dives into the next frontier of AI with Fei-Fei Li, the renowned "godmother of AI" and founder of World Labs, and her investor Martin Casado. They explore why current AI, dominated by language models, falls short of true intelligence and how building "world models" for 3D spatial understanding will unlock a new era of innovation.

1. Beyond Language: The Call for Spatial Intelligence

  • "Language is an incredibly powerful encoding of thoughts and information, but it's actually not a powerful encoding of what the 3D physical world that all animals and living things live in." - Fei-Fei Li
  • "If I put you in a room and I blindfolded you and I just described the room and then I asked you to do a task, the chances of you being able to do it are very little... On the other hand, if I took off the blindfold and you can see the actual space... then you can actually go and manipulate things." - Martin Casado
  • While Large Language Models (LLMs) are transformative, they lack understanding of the physical, 3D world—a domain where much of animal and human intelligence evolved and operates. Fei-Fei Li notes language is a "lossy" and "purely generative" representation, unlike the inherent physical world.
  • Spatial intelligence, the ability to perceive and interact with 3D environments, is fundamental. Human evolution and innovation, from basic survival to discovering DNA's double helix, heavily rely on it.

2. World Labs: Pioneering 3D Foundational Models

  • "My intellectual journey is not about company or papers, it's about finding the northstar problem... the time has come that concentrated industry-grade effort focused effort in terms of compute, data, talent is really the answer to to bringing this to life." - Fei-Fei Li
  • Fei-Fei Li founded World Labs to tackle this "northstar problem" of creating AI that genuinely understands 3D space, an endeavor requiring "deep tech" expertise in AI, computer vision, and graphics.
  • She sought an "intellectual partner" in her first investor, Martin Casado, who shared her conviction about the necessity of "world models" when many others didn't grasp the concept beyond polite nods.

3. Unleashing 3D AI: From Robotics to the Multiverse

  • "Suddenly, we can actually create infinite universes. Some are for robots, some are for creativity, some are for socialization... It suddenly will enable us to live in the multiverse." - Fei-Fei Li
  • World models promise to revolutionize fields like robotics (any embodied machine needing to navigate physical space) and creativity (architecture, industrial design, entertainment).
  • The technology is profoundly horizontal, enabling computers to take 2D views (images, videos) and generate full, interactive 3D representations, including unseen parts or entirely new environments.
  • This opens possibilities from enhanced productivity tools to the creation of boundless digital worlds, impacting how we work, play, and interact. Fei-Fei Li’s temporary loss of stereo vision underscored the critical, non-obvious importance of 3D perception for real-world tasks.

Key Takeaways:

  • AI's next evolutionary leap lies in mastering 3D physical reality, a challenge World Labs is spearheading. This requires moving beyond language-centric models to embrace spatial intelligence, unlocking capabilities that are fundamental to how we, and future intelligent systems, will interact with the world.
  • Spatial is Special: The 3D world is AI's next grand challenge; understanding it is key to more general intelligence.
  • Deep Tech, Deep Impact: Building foundational 3D world models is a complex, resource-intensive endeavor with transformative, cross-industry potential.
  • Beyond Reconstruction, Towards Creation: 3D AI will not only help us understand and navigate our world but also empower us to generate and experience infinite new realities.

For further insights and detailed discussions, watch the full podcast: Link

This episode explores the critical leap beyond language models with Fei-Fei Li and Martin Casado, detailing why "world models" capable of understanding 3D space are the next frontier in AI, and the profound implications for creating truly intelligent systems.

The Genesis of World Labs: A Shared Vision for Spatial AI

  • Martin Casado introduces Fei-Fei Li, often referred to as the "godmother of AI," highlighting her singular contribution of bringing data to the forefront of AI development—a concept now recognized as fundamental.
  • Fei-Fei Li explains her choice of Martin as World Labs' first investor, emphasizing her search for more than just capital. She sought an "intellectual partner" with deep technical acumen to navigate the complexities of building deep tech.
    • Fei-Fei Li: "I was also particularly looking for an intellectual partner... who is a computer scientist who is a student of AI who is understand product market."
  • Speaker Analysis: Martin, an accomplished entrepreneur and investor with a Stanford PhD, provides authoritative context on Fei-Fei's foundational impact on AI. Fei-Fei, a distinguished Stanford professor and founder of World Labs, articulates the necessity of profound technical understanding and collaborative partnership in her ventures.
  • Actionable Insight: For Crypto AI investors, this underscores the strategic advantage of conducting deep technical due diligence and fostering partnerships with founders who value genuine intellectual collaboration, particularly when venturing into frontier technologies like advanced AI.

Beyond Language: The "World Model" Epiphany

  • The foundational idea for World Labs emerged from a pivotal conversation where Fei-Fei Li and Martin Casado both identified a critical gap in current AI capabilities: the limitations of Large Language Models (LLMs). LLMs are AI models extensively trained on text data to comprehend and generate human-like language.
  • They converged on the concept that a "world model"—an AI capable of understanding and reasoning about 3D space and physical interactions—was the essential missing component for AI to truly understand and navigate the world.
    • Fei-Fei Li (recounting the conversation): "Faith leans over to me. She's like, 'You know what we're missing?' And I said, 'What are we missing?' She said, 'We're missing a world model.' And I'm like, 'Yes.'
  • Fei-Fei further validated this shared vision by asking Martin to define his concept of a world model. His description—an AI model that genuinely understands the 3D structure, geometry, and compositional nature of the world—perfectly matched her own deeply considered perspective.
  • Speaker Analysis: The anecdote shared by both speakers reveals a serendipitous yet intellectually rigorous alignment, highlighting the thoughtful and convergent paths that led to their collaboration on World Labs.
  • Actionable Insight: Crypto AI researchers and investors should actively seek to identify fundamental missing pieces in current AI paradigms, moving beyond prevailing trends. These conceptual gaps often represent the most significant opportunities for groundbreaking innovation and market disruption.

The Unforeseen Power of Data and the Limits of Language

  • Fei-Fei Li, despite her pioneering role in establishing data as a cornerstone of modern AI, expresses ongoing astonishment at the advanced capabilities of data-intensive models, noting their "incredible emergent behaviors of thinking machine."
  • However, her intellectual journey led her to a crucial insight: while language is an incredibly potent tool for encoding thoughts and information, it is an inherently "lossy" medium for representing the complex, nuanced reality of the 3D physical world.
    • Fei-Fei Li: "Language is a lossy way to capture the world."
  • She posits that human intelligence, and indeed civilization itself, is fundamentally built upon our ability to perceive, understand, and interact with the physical environment—a domain that largely eludes direct and complete encoding through language alone. This profound realization was a primary motivator for establishing World Labs, aiming to address this "northstar problem" through a focused, industry-scale research and development effort.
  • Technical Term: Foundation Model: A large-scale AI model trained on a vast and diverse dataset, engineered to be adaptable for a wide array of specific tasks with further fine-tuning. LLMs are a prominent category of foundation models.
  • Actionable Insight: While data-driven AI systems like LLMs demonstrate remarkable power, Crypto AI investors should actively explore and support ventures developing models that address the inherent limitations or "lossy" aspects of current AI. Opportunities abound in systems designed for nuanced understanding and interaction with the physical, 3D world.

Why Language Models Aren't Enough: The Primacy of Spatial Intelligence

  • Martin Casado illustrates the inadequacy of language for conveying complex spatial information through a simple thought experiment: imagine a blindfolded person attempting to navigate an unfamiliar room solely based on verbal descriptions, versus being able to see the room directly. This starkly contrasts the imprecision of language with the richness of direct spatial perception, which our brains use to construct detailed 3D mental models for effective interaction.
  • He notes a surprising turn in AI development: language-focused AI (LLMs) achieved breakthroughs relatively quickly, despite decades of intensive research and massive investment (e.g., an estimated $100 billion in autonomous vehicles) in solving spatial navigation problems.
    • Martin Casado: "It's that language went first because we've like worked so hard on robotics, right? I mean, I feel like even to look at autonomous vehicles... as an industry, we've invested like a hundred billion dollars in it."
  • Fei-Fei Li adds an evolutionary perspective, pointing out that the brain regions dedicated to language processing are relatively recent developments, whereas the neural machinery for spatial navigation and understanding has been honed over approximately 500 million years of evolution.
  • Actionable Insight: The rapid advancement of LLMs, which tackle evolutionarily recent cognitive functions, suggests that AI systems addressing more ancient and deeply embedded forms of intelligence, such as spatial reasoning, could unlock even more transformative capabilities. This represents a frontier for Crypto AI research and investment.

The Critical Role of Spatial Intelligence and "Large World Models"

  • Fei-Fei Li, whose academic career has consistently centered on computer vision, emphasizes that the remarkable success of LLMs serves as an inspiration and catalyst for the development of Large World Models (LWMs). These are envisioned as sophisticated AI systems designed to comprehend, simulate, and interact with 3D environments.
  • She clarifies that the focus on LWMs is not a dismissal of language models but rather a recognition of their complementary roles. Spatial intelligence is presented as a critical, distinct component of overall intelligence.
    • Fei-Fei Li: "Space, the 3D space... the spatial intelligence that enable people to do so many things that's beyond language is a part of a critical part of intelligence."
  • To illustrate the power of non-linguistic, spatial reasoning, she cites scientific breakthroughs like the discovery of DNA's double helix structure and the understanding of Buckyball (Buckminsterfullerene C60) molecular structures—achievements impossible through language alone.
  • Speaker Analysis: Fei-Fei's long-standing dedication to vision research lends significant weight to her conviction in the necessity of world models. She frames the advancements in LLMs not as a competing paradigm but as a development that brings the realization of powerful world models closer.
  • Actionable Insight: Crypto AI investors and researchers should explore how decentralized technologies could uniquely contribute to or leverage LWMs. Potential applications include the creation of persistent, verifiable, and shared virtual worlds, or enhancing the capabilities of robotics and Internet of Things (IoT) ecosystems through sophisticated spatial understanding.

Applications of World Models: From Creativity to the Multiverse

  • Fei-Fei Li outlines a wide spectrum of potential applications for advanced world models. These include significantly augmenting creativity in visually-intensive fields such as graphic design, filmmaking, architecture, and industrial design.
  • Furthermore, LWMs are poised to revolutionize robotics—defined broadly as any embodied machine, extending beyond humanoid forms to include autonomous vehicles and specialized industrial robots—by endowing them with the ability to robustly understand and navigate complex 3D spaces.
  • A particularly transformative prospect she highlights is the capacity to generate "infinite universes"—diverse digital and virtual worlds tailored for purposes such as advanced robot training simulations, immersive social interaction platforms, novel forms of virtual travel, and dynamic storytelling experiences.
    • Fei-Fei Li: "Suddenly we can actually create infinite universes... It suddenly will enable us to live in the multiverse."
  • This technology, which synergistically combines generative AI with 3D reconstruction capabilities, promises to fundamentally reshape human interaction with both digital and physical realities.
  • Technical Term: Robotics: An interdisciplinary branch of engineering and science that involves the conception, design, manufacture, and operation of robots. It encompasses control systems, sensory feedback, and information processing necessary for robotic function.
  • Actionable Insight: The capability to generate, simulate, and interact with complex 3D worlds, as envisioned for LWMs, has direct and profound implications for metaverse concepts. Crypto AI researchers should investigate how LWMs could underpin the development of more realistic, interactive, economically vibrant, and potentially decentralized virtual environments.

Concrete Capabilities: Making 3D Worlds Actionable for AI

  • Martin Casado clarifies that world models, much like LLMs, are "truly horizontal" technologies, meaning their applicability spans a vast range of domains.
  • He provides a concrete example of their function: these models can ingest a 2D visual input, such as a single photograph or video frame, and from it, generate a comprehensive 3D representation of the scene within a computer. This 3D model includes inferred information, such as the unseen back of an object.
  • This fully realized 3D representation can then be dynamically manipulated, accurately measured, and utilized for a multitude of tasks. These range from practical applications in architecture and industrial design to entirely generative uses in creating video game environments or artistic content.
    • Martin Casado: "With these models, you can take a view of the world like a 2D view... and then you could actually create a 3D full representation including what you're not seeing."
  • This core capability—to transform limited 2D inputs into rich, actionable 3D data—is pivotal for any application that requires an AI to understand or interact with three-dimensional space, from autonomous robotics to immersive digital art.
  • Actionable Insight: For Crypto AI investors, the "horizontal" nature of world models suggests a market potential and application breadth comparable to that of LLMs. Identifying and investing in early, high-impact use cases where sophisticated 3D understanding offers a decisive competitive advantage will be crucial.

The Indispensability of 3D for AI Interaction

  • The discussion strongly reinforces the fundamental necessity of 3D representation for AI systems designed to interact with the world. Core aspects of reality—physics, object interaction, and navigation—are inherently three-dimensional.
  • While humans possess the cognitive ability to infer 3D structure from 2D visual inputs (like a video), a computer or robot requires explicit 3D information to perform actions in space, such as accurately judging distances, grasping objects, or navigating complex environments.
  • Fei-Fei Li shares a compelling personal story: a temporary cornea injury caused her to lose stereo vision for several months. This experience made everyday tasks like driving, even in her own familiar neighborhood, profoundly difficult and frightening due to an impaired ability to judge distances accurately.
    • Fei-Fei Li: "I realized I don't have a good distance measure between my car and the parked car... I had to be so slow like almost 10 miles an hour so that I don't scratch the cars."
  • This anecdote powerfully illustrates the critical importance of 3D perception for safe and effective real-world interaction—a capability that world models aim to provide to AI systems.
  • Actionable Insight: AI systems intended for real-world applications—including robotics, autonomous systems, and augmented reality—will critically depend on robust and accurate 3D world modeling capabilities. This is not merely an enhancement but a foundational requirement for significant progress in these fields.

The Research Landscape and World Labs' Approach to 3D AI

  • The field of research dedicated to world models is relatively nascent compared to the more established domain of LLMs, but it builds upon decades of foundational work in computer vision and computer graphics.
  • Fei-Fei Li highlights key enabling technologies, including Neural Radiance Fields (NeRF)—a groundbreaking technique for 3D scene reconstruction from 2D images, co-developed by World Labs co-founder Ben Mildenhall. She also mentions Gaussian Splatting, another innovative method for 3D representation and rendering, with pioneering contributions from World Labs co-founder Christoph Lassner. Early work in image generation using GANs (Generative Adversarial Networks)—a class of machine learning frameworks where two neural networks contest with each other in a game-like scenario to generate realistic outputs—also laid crucial groundwork.
    • Technical Term: Neural Radiance Fields (NeRF): A deep learning method that synthesizes novel 3D views of a scene by learning a continuous volumetric representation from a collection of 2D images.
    • Technical Term: Gaussian Splatting: A 3D scene representation and rendering technique that models scenes using a multitude of 3D Gaussian functions, enabling high-fidelity, real-time rendering.
  • World Labs is strategically consolidating global talent across computer vision, diffusion models, computer graphics, numerical optimization, and general AI to address the "singular big northstar problem" of creating powerful world models and translating these research breakthroughs into viable products.
    • Martin Casado: "It really feels like to solve this problem, you need experts both in AI... and graphics... It takes a very special team to actually crack this problem."
  • Speaker Analysis: Fei-Fei's concise overview of the research antecedents, combined with Martin's emphasis on the multidisciplinary team composition, underscores the complex, interdisciplinary expertise required to pioneer this next wave of AI.
  • Actionable Insight: The advancement of world models hinges on the synergistic convergence of AI algorithms and sophisticated computer graphics techniques. Crypto AI researchers should closely monitor innovations in NeRF, Gaussian Splatting, GANs, diffusion models, and related fields. Investors should prioritize teams demonstrating this rare and critical blend of expertise.

Conclusion

This episode reveals that AI's next evolution lies in mastering 3D spatial intelligence through "world models." For Crypto AI investors and researchers, this signals a shift towards creating and interacting with rich, simulated environments, demanding new approaches to data, computation, and decentralized infrastructure.

Others You May Like