Machine Learning Street Talk
March 23, 2025

Exploring Program Synthesis: Francois Chollet, Kevin Ellis, Zenna Tavares

This episode dives into the fascinating world of program synthesis, exploring the limitations of deep learning, the potential of symbolic approaches, and the quest for hybrid models that can effectively learn and generalize. Francois Chollet, creator of Keras, joins Kevin Ellis and Zenna Tavares to discuss the challenges and opportunities in this emerging field.

The Limits of Deep Learning for Program Synthesis

  • “Around 2016… I thought you could use gradient descent as a full replacement for programming… but the neural network would always try to latch onto statistical regularities… and would not be able to actually implement the passing program I wanted.”
  • “Gradient descent is just not the way to learn algorithms like this; you actually need this good search.”
  • Deep learning excels at pattern matching in continuous spaces but struggles with discrete, symbolic tasks like program synthesis.
  • Gradient descent, the core optimization algorithm in deep learning, tends to overfit to noise and fails to find generalizable program solutions.
  • Even when initialized with the correct solution, neural networks can unlearn it and converge to an overfit solution when trained further.

The Potential of Symbolic Approaches and Hybrid Models

  • “The representation problem… is real… but it’s not the primary bottleneck. You could do, in principle, everything with neural networks if you had the proper learning mechanism.”
  • “If you have a problem that is more discrete in nature… it is clearly not an optimal choice [to use neural networks]. There are no benefits to doing so.”
  • Symbolic methods, though less explored due to limited resources compared to deep learning, hold promise for program synthesis.
  • While neural networks can represent discrete structures, they offer no advantages for manipulating them.
  • The ideal approach may involve a hybrid model that combines the strengths of both continuous and discrete representations, potentially integrating neural networks deeper into the semantics of programming languages.

The Importance of Infrastructure and Benchmarks

  • “I really don’t think we have the right foundations today… In the future there will be a Keras for program synthesis, I'm quite sure.”
  • “Arc is this very clean, very minimalistic microworld… it’s all about abstraction generation.”
  • Current infrastructure is insufficient for program synthesis; more research is needed to understand what works and scales before building dedicated frameworks.
  • Benchmarks like Arc, while not perfect, provide valuable microworlds for studying generalization and on-the-fly adaptation, isolating core challenges without requiring extensive domain knowledge.
  • Arc 2 promises to focus on tasks requiring stronger generalization, emphasizing compositional novelty that challenges current AI systems.

Key Takeaways:

  • Deep learning alone is insufficient for program synthesis; symbolic approaches and hybrid models are crucial for tackling discrete, algorithmic tasks.
  • Developing dedicated infrastructure for program synthesis is premature; further research is needed to identify effective, scalable techniques.
  • Benchmarks like Arc are essential for driving progress in program synthesis, providing focused environments to study generalization and adaptation.

Actionable Insights:

  • Deep learning's strength lies in pattern recognition, not program generation. Symbolic methods or hybrid models are key to unlocking the true potential of program synthesis.
  • A "Keras for Program Synthesis" is coming, but not yet. More foundational research is needed before building specialized frameworks.
  • Arc, particularly Arc 2, is a crucial testing ground for stronger generalization in AI, pushing beyond mere interpolation towards true compositional understanding.

For further insights and detailed discussions, watch the full podcast: Link

This episode explores the evolution of program synthesis, contrasting neural network approaches with symbolic methods, and highlights the challenges and potential of integrating these paradigms for advanced AI generalization.

Deep Learning's Limitations in Program Synthesis

  • François Chollet, creator of Keras, initially believed deep learning could replace programming entirely. He envisioned gradient descent as a universal programming method, capable of training neural networks for any task given enough examples.
  • Around 2015-2016, Chollet worked with Christian Szegedy at Google, attempting to use deep learning for theorem proving. They aimed to guide a symbolic theorem prover using neural networks trained to interpret higher-order logic statements.
  • Chollet encountered significant difficulties, realizing neural networks latched onto statistical regularities (noise) rather than implementing the desired parsing program. He states, "no matter what you tried the neural network would always try to latch onto statistical regularities like said School noise effectively and would not be able to actually implement the passing program I wanted it to implement."
  • This led him to conclude that deep learning excels at pattern matching in continuous spaces but struggles with discrete, symbolic programs, for which gradient descent is not optimal.

Learning Mechanisms vs. Representations

  • The discussion shifts to whether the limitations are due to the learning mechanism (gradient descent) or the representation (neural networks).
  • Chollet asserts the primary bottleneck is the learning mechanism: "the primary problem is quite inent is just not the way to learn programs."
  • While acknowledging the need for better representations, he emphasizes that even with an ideal representation, gradient descent would still struggle to learn algorithms.
  • Vector spaces are suitable for continuous problems, but embedding discrete structures in continuous spaces doesn't offer benefits for interpolation or movement within that space.

Hybrid Substrates and Deep Integration

  • The conversation explores the potential for a hybrid substrate—a data structure and learning process that combines the characteristics of continuous (neural networks) and discrete (symbolic search) approaches.
  • One idea involves deeper integration of neural networks into programming languages, beyond simple function calls.
  • An analogy is presented: debugging a Python program, where a neural network could control the program's execution dynamics at each step, inspecting the stack and potentially revising previous steps. Chollet mentions, "one kind of M model I have in my mind is of debugging."
  • Another approach is to use neural networks to implement the operations of a program interpreter, focusing on behavioral equivalence rather than structural equivalence.

Infrastructure and the Future of Program Synthesis

  • Chollet believes current infrastructure for program synthesis is inadequate, comparing it to the early days of deep learning (around 2014).
  • He anticipates a "breakthrough moment" and crystallization of understanding, leading to the development of specialized infrastructure, similar to how Keras built upon automated differentiation for deep learning.
  • The discussion touches on classical program synthesis techniques (abstraction, abstract interpretation, SAT/SMT solvers) and the rise of large language models (LLMs) for code generation.
  • Chollet suggests the future is more learned and data-driven but believes symbolic abstractions can be learned via symbolic search. He attributes the current dominance of LLMs to their scale and the vast resources invested in them.

Scaling, Knowledge Representation, and the Role of Learning

  • The conversation addresses the failure of projects like Cyc (which attempted to scale symbolic knowledge) and the success of GPT-like models.
  • Chollet attributes Cyc's limitations to its reliance on human labor, emphasizing the need for delegating tasks to computers for scalability.
  • He also points out that classical ontologies are not good representations of knowledge, as the world is not encoded in graphs.
  • Vector spaces and embeddings are considered intrinsically better representations of knowledge for most data types, explaining the success of neural networks across various modalities.

ARC (Abstraction and Reasoning Corpus) and Generalization

  • The discussion turns to the ARC challenge, a benchmark designed to test AI's ability to generalize and adapt to novelty.
  • Chollet's biggest insight from ARC is that it's possible to make LLMs and deep learning models adapt to novelty, with test-time training being a key technique.
  • He distinguishes between compositional novelty (recombining elementary building blocks) and other forms of novelty. Transformers struggle with function composition, a key aspect of compositional novelty.
  • A new version of the ARC dataset (ARC 2) is planned, focusing on stronger generalization and compositional complexity, with fewer tasks solvable by brute-force search.

ARC's Continued Relevance and the Marrow Project

  • The speakers discuss the continued relevance of ARC for the Marrow project, which aims to develop AI systems that can actively learn and experiment.
  • Chollet advocates for continuing to work on ARC, describing it as a "micro-world" for important generalization and on-the-fly adaptation problems.
  • He highlights ARC's minimalistic nature, focusing on abstraction and generalization without the complexities of specialized knowledge (e.g., programming languages) found in other benchmarks.

The conversation highlights that ARC provides a focused environment for studying generalization, crucial for advancing AI. Investors and researchers should monitor developments in test-time training and hybrid approaches, as these are key to achieving stronger AI generalization capabilities.

Others You May Like