This episode exposes the critical data infrastructure gap hindering physical AI's real-world deployment—and why traditional tabular data models fail embodied intelligence. Rerun CEO Nico West argues a new data paradigm is essential for scaling robotics beyond lab demos.
Rerun's Data Platform: Powering Physical AI Debugging
- Rerun, an open-source SDK and commercial data platform, provides logging, modeling, querying, and visualization for multimodal, time-series data. The platform's initial focus on spatial computing (AR/VR headsets) expanded to learning-first robotics and autonomy.
- Rerun logs diverse sensor data—3D sensors, multiple cameras (RGB, motion), neural network outputs (bounding boxes), and time series (CPU, confidence scores).
- The system enables debugging complex multimodal systems over time, crucial for computer vision and robotics.
- Beyond robotics, Rerun finds niche applications in areas like hedge funds, leveraging its high-performance streaming visualization for complex time-series data.
- Nico West emphasizes the platform's radical simplicity and performance, built on an in-memory database from scratch.
- Nico West: "We just wanted to make it like a hundred times easier to like debug these sort of system multimodal systems like computer vision or robotic systems that do things over time."
The State of Robotics: From Lab to Laundry
- Recent advancements in AI-first robotics demonstrate significant progress in advanced manipulation, a historically difficult problem. End-to-end learning methods now tackle tasks once considered impossible.
- Advanced Manipulation: Tasks like folding laundry, previously elusive for precise, pre-programmed robots, have become "boring" due to breakthroughs in end-to-end learning.
- Imitation Learning: This robotics equivalent of supervised learning involves teleoperating robots to record demonstration data (joint angles, perception inputs) and training neural networks to replicate tasks robustly.
- Reinforcement Learning (RL): Traditionally used for motion, RL now combines with imitation learning to produce robust manipulation capabilities.
- Scalable ML for Robotics: The LLM "ChatGPT moment" highlighted scalable machine learning's power, priming the field for similar breakthroughs in robotics. Innovations in modeling robotics problems (distinct from text) and the use of Transformers have driven this.
- Nico West: "Folding laundry has kind of gone from being impossible to sort of boring over the last year basically."
Bridging the Gap: Demos vs. Deployment
- Despite impressive demos, a significant gap exists between research prototypes and widely deployed, robust robotics products. Practicality often trumps aesthetic appeal in real-world applications.
- Productization Challenges: The "fat tail" of physical world variations makes full autonomy in homes difficult; product deployment requires extensive servicing, onboarding, and robust handling of countless small failure points.
- Practical Robotics: Less-hyped companies focus on aggressively deploying "scrappy" robots in warehouses, utilizing open-source Vision-Language-Action (VLA) models (neural networks that process visual, linguistic, and action data) and integrating human teleoperation where needed.
- Pockets of Production: While not yet at massive scale, companies deploy tens to hundreds of robots in manufacturing for simple pick-and-place tasks, demonstrating learning-based manipulation in production.
- Consumer Robotics: Beyond advanced vacuum cleaners (like Matic, which offers superior mapping and autonomy over Roomba), consumer robotics remains limited. The next step likely involves vacuums that also pick up toys.
- Nico West: "At the end of the day you're not it's not just a robot that can do a task. You need to build a product."
Rerun's Architectural Philosophy: Flexible Data for a Messy World
- Rerun's product design prioritizes flexibility and performance, recognizing the unique challenges of physical data. This led to multiple data model redesigns and an open-source visualization strategy.
- Flexible Data Model: Rerun's data model, inspired by entity-component systems (a game development pattern where entities are composed of various components, allowing flexible data representation), allows users to compose and visualize diverse data types without rigid upfront schemas.
- Open-Source Visualization: Rerun made its base visualization open source, believing monetization is difficult for such a widely deployed and essential debugging tool. This fosters adoption and allows companies to embed and customize the visualizer.
- Commercial Backend: The commercial product provides a cloud backend for managing very large datasets, supporting the "record, curate, train" loop for robotics data pipelines.
- Physical Data Properties: Robotics data is inherently multimodal (images, sensors, text), multi-rate (different sensors update at different frequencies), and episodic (tasks occur in distinct sequences). This "physical data" cannot fit into traditional tabular models like Parquet or Iceberg, necessitating custom file formats and indexing systems.
- Nico West: "We've actually redesigned the data model probably four times at this point."
The Future of Robotics: Data, Benchmarks, and Ecosystems
- The conversation highlights critical needs for the future of physical AI, including new data types, standardized evaluation, and the enduring power of established ecosystems like ROS.
- Emerging Data Types: Force and tactile touch sensors, along with audio, are becoming increasingly important for robots to interact effectively with the physical world.
- Benchmark Problem: Robotics lacks robust, generalizable benchmarks. The need to co-design and co-train models with specific hardware makes standardized evaluation difficult, often pushing benchmarks into less realistic simulations. Companies increasingly rely on internal, live robot testing.
- ROS's Staying Power: Despite common frustrations, the Robot Operating System (ROS) persists due to strong network effects and its role as an evolving standard for message passing and data definitions (e.g., image representation, 3D poses). This ecosystem enables pluggable, open-source modules.
- "Bubble" or Opportunity? High valuations for general-purpose robotics companies reflect the pursuit of astonishingly large markets. The ability to serve many use cases with the same hardware drives scale, which in turn lowers hardware costs and demands vast amounts of data and compute.
- Nico West: "It's a huge problem for the field that there aren't any great benchmarks."
Investor & Researcher Alpha
- Data Infrastructure as a Bottleneck: The core challenge for physical AI is not just model development but the lack of mature, flexible data infrastructure. Traditional data formats and processing tools (like Spark, Databricks for LLMs) are inadequate for multimodal, multi-rate, episodic robotics data. Investment in specialized data platforms (like Rerun) is critical.
- Focus on Robustness over Raw Intelligence: The immediate breakthrough in physical AI will be significantly improved robustness for longer, more complex tasks, including self-correction and on-the-fly learning. High-level reasoning will follow.
- The "Generalist Robot" Thesis: The pursuit of general-purpose robots (especially humanoids or semi-humanoids) is driven by the economic imperative to achieve hardware scale by serving diverse use cases. This necessitates massive data collection, often favoring human-like form factors for easier teleoperation and data generation.
Strategic Conclusion
- Physical AI's advancement hinges on a fundamental shift in data management. The industry must move beyond traditional tabular models to embrace flexible, high-performance systems designed for multimodal, multi-rate, episodic data. The next step involves building robust, scalable data pipelines that empower researchers to iterate rapidly on real-world physical data, accelerating the transition from impressive demos to ubiquitous, intelligent robots.