This episode explores the evolution of program synthesis, contrasting neural network approaches with symbolic methods, and highlights the challenges and potential of integrating these paradigms for advanced AI generalization.
Deep Learning's Limitations in Program Synthesis
- François Chollet, creator of Keras, initially believed deep learning could replace programming entirely. He envisioned gradient descent as a universal programming method, capable of training neural networks for any task given enough examples.
- Around 2015-2016, Chollet worked with Christian Szegedy at Google, attempting to use deep learning for theorem proving. They aimed to guide a symbolic theorem prover using neural networks trained to interpret higher-order logic statements.
- Chollet encountered significant difficulties, realizing neural networks latched onto statistical regularities (noise) rather than implementing the desired parsing program. He states, "no matter what you tried the neural network would always try to latch onto statistical regularities like said School noise effectively and would not be able to actually implement the passing program I wanted it to implement."
- This led him to conclude that deep learning excels at pattern matching in continuous spaces but struggles with discrete, symbolic programs, for which gradient descent is not optimal.
Learning Mechanisms vs. Representations
- The discussion shifts to whether the limitations are due to the learning mechanism (gradient descent) or the representation (neural networks).
- Chollet asserts the primary bottleneck is the learning mechanism: "the primary problem is quite inent is just not the way to learn programs."
- While acknowledging the need for better representations, he emphasizes that even with an ideal representation, gradient descent would still struggle to learn algorithms.
- Vector spaces are suitable for continuous problems, but embedding discrete structures in continuous spaces doesn't offer benefits for interpolation or movement within that space.
Hybrid Substrates and Deep Integration
- The conversation explores the potential for a hybrid substrate—a data structure and learning process that combines the characteristics of continuous (neural networks) and discrete (symbolic search) approaches.
- One idea involves deeper integration of neural networks into programming languages, beyond simple function calls.
- An analogy is presented: debugging a Python program, where a neural network could control the program's execution dynamics at each step, inspecting the stack and potentially revising previous steps. Chollet mentions, "one kind of M model I have in my mind is of debugging."
- Another approach is to use neural networks to implement the operations of a program interpreter, focusing on behavioral equivalence rather than structural equivalence.
Infrastructure and the Future of Program Synthesis
- Chollet believes current infrastructure for program synthesis is inadequate, comparing it to the early days of deep learning (around 2014).
- He anticipates a "breakthrough moment" and crystallization of understanding, leading to the development of specialized infrastructure, similar to how Keras built upon automated differentiation for deep learning.
- The discussion touches on classical program synthesis techniques (abstraction, abstract interpretation, SAT/SMT solvers) and the rise of large language models (LLMs) for code generation.
- Chollet suggests the future is more learned and data-driven but believes symbolic abstractions can be learned via symbolic search. He attributes the current dominance of LLMs to their scale and the vast resources invested in them.
Scaling, Knowledge Representation, and the Role of Learning
- The conversation addresses the failure of projects like Cyc (which attempted to scale symbolic knowledge) and the success of GPT-like models.
- Chollet attributes Cyc's limitations to its reliance on human labor, emphasizing the need for delegating tasks to computers for scalability.
- He also points out that classical ontologies are not good representations of knowledge, as the world is not encoded in graphs.
- Vector spaces and embeddings are considered intrinsically better representations of knowledge for most data types, explaining the success of neural networks across various modalities.
ARC (Abstraction and Reasoning Corpus) and Generalization
- The discussion turns to the ARC challenge, a benchmark designed to test AI's ability to generalize and adapt to novelty.
- Chollet's biggest insight from ARC is that it's possible to make LLMs and deep learning models adapt to novelty, with test-time training being a key technique.
- He distinguishes between compositional novelty (recombining elementary building blocks) and other forms of novelty. Transformers struggle with function composition, a key aspect of compositional novelty.
- A new version of the ARC dataset (ARC 2) is planned, focusing on stronger generalization and compositional complexity, with fewer tasks solvable by brute-force search.
ARC's Continued Relevance and the Marrow Project
- The speakers discuss the continued relevance of ARC for the Marrow project, which aims to develop AI systems that can actively learn and experiment.
- Chollet advocates for continuing to work on ARC, describing it as a "micro-world" for important generalization and on-the-fly adaptation problems.
- He highlights ARC's minimalistic nature, focusing on abstraction and generalization without the complexities of specialized knowledge (e.g., programming languages) found in other benchmarks.
The conversation highlights that ARC provides a focused environment for studying generalization, crucial for advancing AI. Investors and researchers should monitor developments in test-time training and hybrid approaches, as these are key to achieving stronger AI generalization capabilities.