This episode explores the critical challenge of building AI that learns abstract knowledge from few examples, moving beyond large-scale imitation towards human-like reasoning through compositionality and active world modeling.
Speaker Introductions and Core Research Goals
- Bajaman Kier: Introduces Tufalabs, a new AI research lab focused on models that reason effectively, aiming for AGI research with high freedom and impact for early team members.
- Kevin Ellis: Discusses his interest in creating machines that learn more like humans, focusing on abstract knowledge from fewer examples, world models, and discovering symbolic knowledge. He emphasizes learning communicable knowledge beyond just neural network weights and enabling cooperation between symbolic and neural approaches.
- Zenna Tavares: Co-founder of Basis, shares similar interests with Kevin in understanding and building intelligence, applying it to scientific and societal problems. Basis aims to leverage powerful hardware and pre-trained models while integrating ideas from cognitive science and classic AI, acknowledging that pure scaling might not be the complete path to AGI. Zenna notes, “Our personal belief is that that's not going to carry us all the way.”
Learning from Examples: Beyond Imitation
- Zenna Tavares frames human learning through a Bayesian lens: we possess prior beliefs (hypotheses) about the world and update them based on observed examples. Bayesian inference provides a theoretical foundation for how an ideal agent incorporates knowledge, though human implementation is approximate.
- A key distinction in their research is moving beyond large-scale imitation learning, prevalent in current mainstream ML due to its effectiveness. They aim to understand and build intelligent machines from "first principles," identifying core components like uncertainty, causality, and reasoning, while acknowledging many unknown pieces remain.
- Strategic Implication: Investors should recognize the potential limitations of purely imitation-based AI. Research focusing on first principles like causality and reasoning, as pursued by Basis, could unlock more robust and generalizable AI, representing a different long-term investment thesis compared to pure scaling plays.
Compositionality: Power and Peril
- Kevin Ellis defines compositionality in their context as using atomic knowledge learned in one situation to build larger structures, enabling extrapolation to new, even out-of-distribution, scenarios. This is crucial for adapting to changing environments where underlying building blocks remain consistent.
- Referencing psychologist Elizabeth Spelke, Kevin highlights the "curse of compositionality"—the double-edged sword where the ability to represent infinitely many concepts leads to a combinatorial explosion, overwhelming the system with possibilities. Early symbolic AI struggled with guiding search through this vast space.
- Modern approaches aim to mitigate this curse by learning to guide searches over program spaces (like spaces of possible code) and learning the fundamental "atoms" of compositional languages, potentially treating them as neural networks.
- Actionable Insight: While compositionality offers powerful generalization, its inherent complexity presents challenges. Crypto AI projects leveraging compositional structures need robust mechanisms (potentially learned heuristics or neural guidance) to navigate the vast search space effectively, a key factor for researchers evaluating system design.
AI vs. Human Compositional Reasoning
- Zenna Tavares points to programming languages as prime examples of compositional systems, where complex programs are built by combining simpler parts. Unlike natural language, programming languages are often strictly compositional.
- Building systems that create compositional structures involves various methods, from traditional grammar-based search to modern LLM-based program generation. There's a spectrum regarding how much semantic knowledge of the structure is incorporated versus relying purely on data.
- The core challenge remains searching the vast space of possible compositional structures (like programs). Smart methods are needed to find the desired structure efficiently.
- Research Focus: Developing efficient search and generation methods for compositional structures, potentially blending symbolic constraints with neural generation, is a critical research area impacting the feasibility of complex AI reasoning systems.
Finding Primitive Abstractions
- Kevin Ellis argues that some sets of primitive building blocks (abstractions) might be inherently better than others, using the evolution of programming languages as an example. While Python is highly effective, it's easy to imagine worse languages, implying a hierarchy of quality.
- However, the optimal set of primitives likely depends on the specific problems and environments the AI will face. This is reflected in the real world by the existence of numerous programming languages tailored to different needs, suggesting a "Pareto frontier" (a set of optimal solutions where improving one aspect requires degrading another) rather than a single best language.
- Strategic Consideration: The choice of foundational primitives or representations in a Crypto AI system is crucial. Systems using adaptable or learnable primitives might hold an advantage over those with fixed, potentially suboptimal, ones, especially in diverse or evolving domains.
Neural vs. Symbolic: Induction and Transduction in ARC
- Zenna Tavares clarifies the relationship between different AI paradigms: neural networks implement algorithms, the Bayesian paradigm is a normative model (what an ideal system should do), and composite methods combine different systems.
- Their ARC (Abstraction and Reasoning Corpus) paper explored combining methods: an induction model (generating an explicit program, like Python code, to solve the task) and a transduction model (directly predicting the output grid from the input, more like a standard neural network).
- Kevin Ellis emphasizes the difference in "type signature": the induction model outputs a function (the program), while the transduction model outputs the solution directly. Their work compared not just neural vs. symbolic but also these two problem-solving styles: explicit, symbolic reasoning (induction) versus intuitive, implicit prediction (transduction).
- Technical Note: The Abstraction and Reasoning Corpus (ARC) is a benchmark designed by François Chollet to measure abstract reasoning capabilities in AI, using visual analogy puzzles solvable with few examples.
The Ensemble Approach and Cognitive Parallels
- The researchers used an ensemble where the system first tries the induction model (explicit function generation). If that fails to produce a verifiable solution, it falls back to the transduction model (direct prediction). Kevin notes, “because you can check the correctness of induction... you can just fall back on your intuition.”
- This mirrors findings in cognitive science: humans sometimes perform worse on certain tasks (like inferring rules with exceptions or statistical learning) if forced to think carefully and verbalize, suggesting distinct cognitive processes. LLMs show similar splits.
- Some ARC problems were better solved by systematic search (induction), while others were better solved by the model "blurting out an answer" (transduction). The ability to verify inductive solutions makes the ensemble strategy effective.
- Insight: The success of the ensemble highlights that different reasoning styles excel at different problems. AI systems capable of flexibly employing both explicit, verifiable reasoning and implicit, intuitive prediction may be more robust and versatile.
Defining "Thinking" in AI
- Zenna Tavares approaches the question cautiously, suggesting "thinking" is hard to define precisely but seems to involve a step-by-step process, internal computations, and belief revision. He distinguishes between knowledge representation (symbolic program vs. neural weights) and the procedure used.
- While their ARC methods represent forms of thinking, a "slow deliberative hypothesis forming" aspect, perhaps closer to human reasoning or complex LLM processes like chain-of-thought, might not be fully captured. Reasoning can range from classical logic to messy, common-sense human reasoning.
- Key Takeaway: Defining "thinking" remains complex. Investors should look beyond simplistic labels and evaluate AI systems based on their specific reasoning capabilities, representations, and the types of problems they can solve, recognizing the spectrum from intuitive pattern matching to deliberate, verifiable reasoning.
Combining Transduction and Induction: Future Directions
- Kevin Ellis explains the bias towards induction in their ensemble: explicit programs generated by high-level languages are more regularized and less prone to overfitting compared to neural networks, which can sometimes just interpolate between data points. A verifiable, explicit description is likely to generalize better.
- Zenna Tavares suggests exploring hybrid "transductive-inductive" models where a neural network acts as the transformation but can be applied point-wise and verified against training examples.
- More fundamentally, he questions the limitations of current representations (Python vs. neural nets) and advocates exploring the "sea of all possible programming languages." This includes neuro-symbolic programming (combining neural and classical components) and potentially restructuring programming languages themselves to better capture desired computational properties.
- Research Frontier: Moving beyond simple ensembles towards deeper integration or fundamentally new computational representations that combine the strengths of symbolic structure and neural flexibility is a key area for future breakthroughs.
The Role and Evolution of Programming Languages
- Kevin Ellis contrasts Domain-Specific Languages (DSLs), often used in earlier AI like DreamCoder, with general-purpose languages like Python. While DSLs can constrain the search space, they might lack representational power (You can't learn what you can't represent). Adding escape hatches to make DSLs Turing-complete reintroduces the curse of compositionality.
- He argues Python is often more practical for current AI challenges (ARC, LLM agents, VQA) due to convergent evolution in software engineering creating powerful, general tools. However, Python isn't perfect, as shown by the ARC results where transduction sometimes outperformed it.
- Zenna Tavares emphasizes the historical evolution of programming languages, adding layers of structure (structured programming, classes, types, modules) to help manage complexity and encode more knowledge. He sees this evolution continuing, potentially accelerated by AI, leading to new ways programming languages and AI systems interact.
- Implication: The tools used to build AI matter. While powerful LLMs can generate code, the underlying structure and expressiveness of the target language (like Python) significantly impact what can be practically achieved. Future developments may involve AI influencing language design itself.
Iterative Refinement and Reinforcement
- Zenna Tavares argues intuitively for iterative refinement: it seems unlikely to always generate the perfect program or hypothesis in one shot. Revising models based on new information (e.g., failures, evidence) mirrors human and scientific hypothesis development.
- A major challenge is guiding this refinement process, especially without a clear objective function. How do systems know if a refinement path is "good"? Current ML uses human feedback or backpropagation from a defined goal, but this isn't always possible.
- Kevin Ellis reinforces that checking progress is crucial. This is feasible in ARC (checking against examples) or math problems (checking the answer) but harder in open-ended scenarios. Learning world models offers a way to check predictions against observed data.
- Open Question: Developing mechanisms for guiding refinement and learning without explicit, easily computable reward signals is a critical bottleneck for building more autonomous, exploratory AI systems relevant to complex real-world or crypto-economic scenarios.
Wake-Sleep Learning and DreamCoder
- Kevin Ellis explains the wake-sleep algorithm philosophy, drawing parallels to learning inverse problems in ML (like inferring 3D structure from 2D images). The "sleep" phase involves a top-down generative process (dreaming/imagining possibilities, e.g., generating programs) and running them forward. The "wake" phase learns the backward inference (e.g., inferring the program from observed behavior).
- Crucially, wake-sleep involves a back-and-forth: learning from synthetic "dream" data, then waking, interacting with the real world, identifying mismatches, and adjusting the dream distribution for the next sleep cycle. This allows adaptation to distribution shifts.
- He highlights the role of compositionality here: programs allow dreaming up plausible new combinations of learned knowledge, preparing the system for unseen but related situations. Dreams should go slightly beyond waking experience.
- Concept: Wake-sleep provides a principled way for models to generate their own training data and adaptively refine their internal models based on interaction, potentially valuable for AI agents in dynamic crypto environments.
From Explicit Libraries to Implicit Learning
- Kevin Ellis describes a shift from DreamCoder's explicit, learned library of symbolic functions towards more implicit methods using large foundation models. Instead of a discrete library, modern systems might use retrieval (finding relevant past code examples) and neural generation (producing similar code via in-context learning).
- He views this as approximating library learning in a softer, probabilistic way. While valuable, he believes explicit, reusable libraries (like those software engineers build) are complementary and ultimately desirable, but automatically building and debugging robust AI libraries remains challenging. In-context learning serves as a practical middle ground.
- Trend: The field is leveraging LLMs to implicitly capture knowledge that was previously explicitly structured in libraries. While pragmatic, the challenge of building robust, reusable, and verifiable knowledge components persists – a key area for research impacting reliable AI deployment.
Exploration vs. Exploitation in Knowledge Building
- Zenna Tavares frames the decision to build a reusable library function (or abstraction) in terms of expected utility. A function is worth creating if it compactly expresses current needs and is expected to be useful for future tasks (by oneself or others), effectively caching computation.
- This involves a trade-off, balancing immediate needs with anticipated future requirements, similar to the exploration-exploitation dilemma. He suggests this could potentially be formalized using rational decision theory.
- Investor Lens: AI systems that can effectively balance building general, reusable knowledge components (exploration/future value) with solving immediate tasks (exploitation) may be more adaptable and efficient long-term investments.
Testing Causal Abstractions through Interaction
- Kevin Ellis stresses that verifying whether learned abstractions truly represent causal relationships requires interaction with the world. An agent needs to perform interventions (actions) and test if its model accurately predicts the consequences.
- Pure function learning or program synthesis makes it hard to distinguish between equivalent representations. However, when an agent must achieve goals, plan, and intervene, incorrect causal models can be falsified by observing outcomes. This contrasts with program synthesis libraries, which are judged on usefulness, not causal faithfulness.
- Requirement for Robust AI: For AI operating in real-world systems (like crypto markets or protocols), the ability to learn and validate causal models through interaction is paramount for reliable prediction and decision-making. Passive observation is insufficient.
The Autumn Paper: Inferring Latent Dynamics
- Zenna Tavares introduces the Autumn paper, which aimed to synthesize the source code of simple video game-like environments by observing interactions. The core idea is that inferring the underlying generative code from dynamics is akin to scientific modeling.
- A key focus was inferring latent state—hidden variables or properties of the world that aren't directly observable but influence dynamics. This is crucial as the real world is full of complex, hidden states.
- However, Zenna notes a limitation: the inferred models in Autumn were ground-truth, not abstract. Real human cognition relies on abstraction, omitting details. A major open question is how to infer abstract models that capture the essential aspects while discarding irrelevant details.
- Challenge: Developing AI that can infer not just observable dynamics but also relevant hidden states and appropriate levels of abstraction is essential for modeling complex systems like economies or user behavior.
Navigating Abstraction Hierarchies
- Kevin Ellis observes that humans often define abstractions on the fly, specific to the problem, rather than relying on one fixed hierarchy. When discarding information to create abstractions, the problem becomes under-constrained.
- Introducing a reward signal provides constraints: a good abstraction is one that helps achieve rewards (as seen in systems like MuZero). His work on "Visual Predicator" showed robots learning abstract representations from pixels by focusing on task-relevant information, ignoring other details.
- However, humans form abstract models even without explicit rewards (e.g., playing with a new object). How this works is an open question, possibly related to intrinsic motivation or robustness across potential future goals.
- Key Insight: Abstraction isn't monolithic. Effective AI may need multiple levels of abstraction and the ability to select or construct the appropriate level dynamically based on the task, goals, or computational constraints.
Resource Rationality and Multiple World Models
- Zenna Tavares finds the Resource Rationality framework compelling: agents should aim to do the best they can given their computational resources and beliefs about future tasks. This provides a principled way to think about choosing or constructing abstractions – balancing accuracy, computational cost, and expected usefulness.
- He emphasizes a crucial, often overlooked point: there isn't one world model. We understand things at multiple levels (e.g., a camera as a button-press device vs. its internal circuits vs. sensor physics). He proposes the term "poly-structural" for systems that explicitly represent multiple models of reality and the relationships between them.
- Encoding the relationship between a model and reality (what it captures, what it omits) within the AI system itself, rather than just in the human designer's head, is a hard but vital computer science challenge. It's unclear if this will emerge from scale or needs explicit design.
- Future AI Architecture: Systems capable of maintaining and reasoning across multiple, interconnected models of the world at different abstraction levels could offer significantly more flexibility and robustness than single-model approaches.
Automating Epistemic Foraging: Learning Priors
- Kevin Ellis warns against building "Frankenstein systems" by manually hardcoding numerous knowledge representations and heuristics. The goal should be rational analysis from first principles, but this leads to computationally hard search problems.
- He suggests using learned neural networks as heuristic guides within a first-principles framework. The neural net can propose potentially good abstractions or reasoning steps (leveraging common sense priors from pre-training), but the overall system retains a principled way to evaluate them.
- Zenna Tavares discusses learning inductive biases (priors). While classic cognitive models often required smart humans to encode these biases, the "bitter lesson" of AI suggests learning them from data is preferable when data and compute are abundant.
- He advocates for learning implicit priors from large, potentially rich datasets (beyond just internet text, perhaps including interaction or observation data), adhering to Bayesian principles but avoiding explicit hand-coding.
- Path Forward: Combining principled frameworks (like Bayesian inference or resource rationality) with learned components (neural heuristics, implicitly learned priors) appears a promising direction for building powerful yet grounded AI systems, reducing reliance on brittle hand-engineering.
Defining Abstraction (Revisited)
- Kevin Ellis reiterates that abstraction fundamentally involves hiding details while retaining essence. In programming, it's often synonymous with lambda expressions (functions abstracting over variable values). In causality, it relates to mapping between causal models where the abstract one ignores details but preserves key relationships.
- Core Concept: Understanding abstraction as a process of selective information hiding is key to designing AI that can simplify complex realities into useful, manageable models.
Richer Ontologies and Abstract Programs
- Zenna Tavares addresses the idea of adding "galaxy brain" concepts like causality or time as primitives. The challenge lies in connecting these high-level concepts to concrete transformations applicable to the task (like manipulating ARC grids).
- He proposes a more concrete direction: developing abstract program representations. Instead of generating fully specified programs, systems could generate program sketches with "holes" or unspecified details. This mirrors human problem-solving, where we often grasp the abstract structure before filling in specifics. Such representations could guide search more effectively.
- Potential Breakthrough: Enabling AI to represent and reason with partially specified, abstract programs or models could significantly improve efficiency and mimic human-like hypothesis refinement.
Human Strategies for Solving ARC
- Kevin Ellis reflects on his own ARC-solving process, noting it's often intuitive and perceptual ("denoising" the input, imagining the output) but sometimes involves systematic thought and half-formed hypotheses. It feels more dynamic than simply generating thousands of programs (the "Green Blot" style).
- He cautions against over-interpreting introspection but suggests the different types of mistakes humans and AI make indicate that current AI approaches may not fully capture the dynamics of human solution construction.
- Implication: There might be fundamental differences between current AI search/generation strategies and the dynamic, perceptual, hypothesis-driven process humans use, suggesting avenues for new algorithm development.
Iterative Application and Data Flow
- Zenna Tavares suggests exploring iterative approaches to ARC, perhaps applying induction and transduction models sequentially or using a REPL-like (Read-Eval-Print Loop) interaction where the AI writes code, evaluates it, analyzes results, and writes more code step-by-step.
- Kevin Ellis strongly agrees with the power of iteration, especially for agents accumulating knowledge over time. Factored representations like DAGs (Directed Acyclic Graphs) or cooperating smaller programs might be better suited for both ARC and continuous learning than monolithic solutions.
- Architectural Shift: Moving towards iterative, step-by-step refinement processes, potentially using more modular or graph-based knowledge representations, could enhance AI's ability to tackle complex problems and learn continuously.
Is Massive Computation in the Spirit of ARC?
- Zenna Tavares acknowledges François Chollet's view of ARC as an imperfect benchmark. Some solutions might rely on "ARC hacks" rather than fundamental insights. Their own approach is likely a mix. Over-specializing a DSL to ARC specifics might miss the intended essence.
- There's an open question about whether ARC can be solved purely from its own data or requires external knowledge (like internet pre-training). Chollet believes the ARC data is sufficient, but empirically, top solutions leverage pre-training.
- The ideal ARC solution would likely be simple, elegant, and avoid task-specific hacks. Introducing related but distinct problems could push research towards more generalizable solutions.
- Benchmark Design: Evaluating AI progress requires careful benchmark design. Overfitting to specific benchmarks can be misleading; focusing on the underlying principles (like few-shot abstract reasoning) across diverse tasks is crucial.
Designing a Better ARC: Project MARA
- Kevin Ellis describes Project MARA (Modeling, Abstraction, Reasoning, Agency/Acting), a joint effort with Zenna at Basis, as aiming to create benchmarks in the spirit of ARC but involving interaction. It's less like standard RL and more like active model-building from few examples in an interactive setting.
- This interaction makes generating synthetic problems harder but acts as a forcing function against overfitting, pushing towards more robust learning.
- Next-Generation Benchmarks: The field needs benchmarks that go beyond passive pattern recognition to assess active learning, interaction, and model building in complex environments – key capabilities for real-world AI applications.
Why Empirical Differences Between Transduction and Induction?
- Zenna Tavares attributes the differences primarily to the underlying representation. Neural networks (transduction in their setup) and Python programs (induction) are different "languages" for expressing transformations. Some concepts are simply easier to express compactly or efficiently in one versus the other, even if both are theoretically universal (can express any computable function).
- He again distinguishes the representation (neural vs. symbolic code) from the "type signature" (outputting a function vs. outputting a solution directly), noting their paper somewhat conflated these, and further exploration could separate them.
- Kevin Ellis adds the computational perspective: transformers have finite computation per pass, corresponding to a specific complexity class, while Python programs allow potentially unbounded loops. Yet, empirically, neural networks can sometimes solve problems easily that are hard to express concisely in Python, a phenomenon not fully theoretically understood but empirically robust.
- Core Trade-off: The choice of representation (neural, symbolic program, hybrid) imposes fundamental trade-offs in expressiveness, efficiency, and ease of learning for different types of computational tasks.
Program Induction by Example and Wake-Sleep with LLMs
- Kevin Ellis describes their "Program Induction by Example" paper (with Wending Liu) as an attempt to implement DreamCoder-style wake-sleep using modern LLMs. It starts with a few human-written programs, uses an LLM to generate similar "dream" programs (forward model/sleep phase), runs them, and trains a synthesizer on the input-output pairs (backward model/wake phase).
- This proved highly effective, substituting symbolic machinery with neural components and leveraging scale. The full wake-sleep cycle was also implemented: solving real problems, remembering solutions, and dreaming variations, allowing adaptation. It lacked DreamCoder's explicit library but used softer, in-context learning.
- LLMs as Components: This demonstrates how LLMs can be integrated into more structured learning frameworks like wake-sleep, acting as powerful generative models for code or hypotheses, bridging symbolic goals with connectionist methods.
Pragmatism Prevailing: Convergence of AI Paradigms
- Zenna Tavares observes a convergence: connectionists embrace hybrid models (like LLMs calling Python) because symbolic tools excel at certain computations, while symbolic AI proponents recognize the need to handle real-world messiness, leading towards integrating neural methods.
- He notes something "obviously right" about scale and learning (connectionism) and something "obviously right" about structured knowledge (symbolic systems), plus normative principles (like Bayesian reasoning). Current convergence often involves compositionally plugging systems together.
- The deeper question is whether we can re-engineer AI from the ground up to integrate these strengths more fundamentally, rather than just composing existing modules.
- Market Trend: Expect continued development of hybrid AI systems. Investors should assess how effectively projects integrate different paradigms, moving beyond simple composition towards potentially more powerful, deeply integrated architectures.
Project MARA and Everyday Science
- Zenna Tavares elaborates on Project MARA at Basis (led with Kevin), focusing on Modeling, Abstraction, Reasoning, and Agency. It can be initially conceived as "active ARC"—building systems that interact with an environment to learn abstract models.
- The project involves developing both new interactive benchmarks and algorithms. A core focus is "everyday science": the process humans (adults and children) use to learn about new objects, devices, or interfaces by interacting, forming hypotheses, and revising beliefs – distinct from large-scale imitation learning.
- They believe the principles underlying everyday science are the same as those in formal science and aim to build systems capable of this interactive, hypothesis-driven learning. Kevin adds this is crucial for agents facing novel situations (new webpages, appliances).
- Research Direction: Focusing on active, interactive learning and model building ("everyday science") represents a significant shift from passive, data-driven approaches, potentially leading to more adaptable and truly intelligent agents.
Call for Collaboration
- Basis is actively seeking researchers (scientists, engineers) interested in working on these challenging problems outside the mainstream focus of larger labs. They are also open to collaborations with others working in adjacent areas.
Conclusion
The discussion underscores a critical shift towards hybrid AI systems blending symbolic reasoning's structure, neural learning's pattern recognition, and active interaction for model building. Investors and researchers must track developments in compositional models, learned abstractions, and interactive learning paradigms, as these represent key frontiers for creating more capable and human-like AI beyond pure scaling.