Current frontier models perform billions of multiplications per token yet fail at basic arithmetic, necessitating a move from empirical pattern matching to a formal "periodic table" of neural architectures rooted in Category Theory.
Chronological Deep Dives
The Algorithmic Failure of Frontier Models
- Frontier models like GPT-4 and VEO (Google's generative video model) mimic reasoning through pattern recognition but collapse when faced with simple algorithmic tasks. Petar Veličković notes that while these models appear realistic, they lack the precision required for robotics or scientific discovery.
- LLMs fail at addition when simple "tricks" or common patterns are removed, proving they do not internalize algorithmic procedures.
- Frontier models perform massive computational overhead (billions of multiplications) but cannot reliably multiply small numbers.
- External tool use (calculators or Model Context Protocol servers) acts as a patch rather than a structural fix for reasoning.
- Internalizing computation is essential for efficiency, as constant tool calling creates significant latency and reasoning bottlenecks.
“Even the best tool in the world is not going to save you if you cannot predict the right inputs for that tool.”
Speaker Attribution: Petar Veličković
The Limits of Geometric Deep Learning
- Geometric Deep Learning (GDL) uses group theory to build Equivariance (a property where transforming an input results in a predictably transformed output) into models. However, GDL assumes all transformations are invertible, which does not hold true for classical algorithms.
- GDL handles spatial regularities like image rotation or graph permutation by assuming no information is lost.
- Classical algorithms like Dijkstra (a method for finding the shortest path in a graph) are non-invertible because they destroy information during execution.
- Transformers are inherently permutation equivariant models, which explains their efficiency but also their limitations in non-invertible reasoning.
- Researchers are moving toward Category Theory to express "post-conditions" and "pre-conditions" that group theory cannot capture.
“Groups, which are the bread and butter of geometric deep learning, might not be enough for aligning to computation.”
Speaker Attribution: Petar Veličković
Category Theory as the Periodic Table of AI
- Deep learning currently operates like alchemy, relying on empirical results without a unifying framework. Andrew Dudanev argues that Category Theory provides a synthetic mathematical foundation to derive architectures rather than discovering them by trial and error.
- Analytic mathematics focuses on what things are made of, while synthetic mathematics focuses on the principles of inference and relationship.
- Category Theory uses Morphisms (generalized functions or arrows representing relationships between objects) to describe structure abstractly.
- The framework allows researchers to treat different neural architectures as instances of the same fundamental mathematical laws.
- This structuralist approach aims to unify the probabilistic, neuroscience, and gradient-based perspectives of AI.
“Categorical deep learning is an attempt to find that periodic table for neural networks.”
Speaker Attribution: Andrew Dudanev
Formalizing Weight Tying with Two-Morphisms
- Weight tying (sharing parameters across different parts of a network) is a standard practice in RNNs and Transformers that lacks a formal theoretical bridge. Higher category theory provides the language to prove when weight sharing preserves the necessary computational structure.
- Two-morphisms (relationships between relationships) model the ways different neural network maps relate to one another.
- Weight tying is formalized as a reparameterization in a two-category of parametric maps.
- This abstraction allows for weight sharing that goes beyond simple copying, enabling complex algebraic relationships between parameters.
- Higher categories may explain emergent effects where the behavior of a composite system differs from its individual parts.
“Two-morphisms allow us to see this algebraic structure encoded as relationships between the weights.”
Speaker Attribution: Andrew Dudanev
The Carry Problem and Neural CPUs
- A fundamental gap in Graph Neural Networks (GNNs) is the inability to model a "carry" (the mechanism in addition where 9 becomes 10). Andrew Dudanev suggests that geometric subtleties in continuous space could finally enable "CPUs in neural networks."
- Carrying is simple in discrete mathematics but extremely difficult to implement in continuous, gradient-based systems.
- The Hopf fibration (a way to decompose a 3D sphere into circles) provides a potential geometric model for the carrying phenomenon.
- Current systems are trained to always provide an answer rather than recognizing when a problem exceeds their computational budget.
- The goal is a system that understands the "effort" required for a task and provides convergence guarantees.
“Are there ways to exploit this type of geometric subtlety to create the phenomenon of carrying and actually properly model algorithmic reasoning?”
Speaker Attribution: Andrew Dudanev
Investor & Researcher Alpha
- The New Bottleneck: Capital is shifting from "scaling laws" (more data/compute) to "architectural priors." Investors should look for teams building Categorical Deep Learning frameworks that reduce data requirements exponentially by imbuing models with structural logic.
- System 2 Architectures: Purely autoregressive models are reaching a plateau in reasoning. The next alpha lies in "System 2" systems that integrate neural pattern matching with algorithmic robustness, similar to AlphaGeometry or FunSearch.
- Obsolete Research: Research focusing on "patching" LLMs with external tools for basic logic is a dead end. The industry is moving toward internalizing these operations through non-invertible Monoids (algebraic structures similar to groups but without required inverses).
Strategic Conclusion
Category Theory is the necessary bridge to move AI from stochastic parrots to verifiable reasoners. By formalizing non-invertible computation and weight tying, researchers can build architectures that respect the laws of logic. The industry must now prioritize structural synthesis over empirical scaling.