The “Final Boss” of Deep Learning
We often think of Large Language Models (LLMs) as all-knowing, but as the team reveals, they still struggle with the logic of a second-grader. Why can’t ChatGPT reliably add large numbers? Why does it “hallucinate” the laws of physics? The answer lies in the architecture. This episode explores how *Category Theory*—an ultra-abstract branch of mathematics—could provide the “Periodic Table” for neural networks, turning the “alchemy” of modern AI into a rigorous science.
In this deep-dive exploration, *Andrew Dudzik*, *Petar Velichkovich*, *Taco Cohen*, *Bruno Gavranović*, and *Paul Lessard* join host *Tim Scarfe* to discuss the fundamental limitations of today’s AI and the radical mathematical framework that might fix them.
—
Key Insights in This Episode:
* *The “Addition” Problem:* *Andrew Dudzik* explains why LLMs don’t actually “know” math—they just recognize patterns. When you change a single digit in a long string of numbers, the pattern breaks because the model lacks the internal “machinery” to perform a simple carry operation.
* *Beyond Alchemy:* *Tim Scarfe* argues that deep learning is currently in its “alchemy” phase—we have powerful results, but we lack a unifying theory. Category Theory is proposed as the framework to move AI from trial-and-error to principled engineering. [00:13:49]
* *Algebra with Colors:* To make Category Theory accessible, the guests use brilliant analogies—like thinking of matrices as *magnets with colors* that only snap together when the types match. This “partial compositionality” is the secret to building more complex internal reasoning. [00:09:17]
* *Synthetic vs. Analytic Math:* *Paul Lessard* breaks down the philosophical shift needed in AI research: moving from “Analytic” math (what things are made of) to “Synthetic” math (how things behave and relate to one another). [00:23:41]
* *The 4D Carry:* In a mind-blowing conclusion, the team discusses how simple algorithmic tasks, like “carrying the one” in addition, actually relate to complex geometric structures like *Hopf Fibrations*. [00:39:30]
—
Why This Matters for AGI
If we want AI to solve the world’s hardest scientific problems, it can’t just be a “stochastic parrot.” It needs to internalize the rules of logic and computation. By imbuing neural networks with categorical priors, researchers are attempting to build a future where AI doesn’t just predict the next word—it understands the underlying structure of the universe.
—
TIMESTAMPS:
00:00:00 The Failure of LLM Addition & Physics
00:01:26 Tool Use vs Intrinsic Model Quality
00:03:07 Efficiency Gains via Internalization
00:04:28 Geometric Deep Learning & Equivariance
00:07:05 Limitations of Group Theory
00:09:17 Category Theory: Algebra with Colors
00:11:25 The Systematic Guide of Lego-like Math
00:13:49 The Alchemy Analogy & Unifying Theory
00:15:33 Information Destruction & Reasoning
00:18:00 Pathfinding & Monoids in Computation
00:20:15 System 2 Reasoning & Error Awareness
00:23:31 Analytic vs Synthetic Mathematics
00:25:52 Morphisms & Weight Tying Basics
00:26:48 2-Categories & Weight Sharing Theory
00:28:55 Higher Categories & Emergence
00:31:41 Compositionality & Recursive Folds
00:34:05 Syntax vs Semantics in Network Design
00:36:14 Homomorphisms & Multi-Sorted Syntax
00:39:30 The Carrying Problem & Hopf Fibrations
—
REFERENCES:
Company:
Model:
[00:01:05] Veo
https://deepmind.google/models/veo/
[00:01:10] Genie
https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/
Paper:
[00:04:30] Geometric Deep Learning Blueprint
https://arxiv.org/abs/2104.13478
[00:16:45] AlphaGeometry
https://deepmind.google/blog/alphageometry-an-olympiad-level-ai-system-for-geometry/
[00:16:55] AlphaCode
https://arxiv.org/abs/2203.07814
[00:17:05] FunSearch
https://www.nature.com/articles/s41586-023-06924-6
[00:37:00] Attention Is All You Need
https://arxiv.org/abs/1706.03762
[00:43:00] Categorical Deep Learning
https://arxiv.org/abs/2402.15332
Thanks to ly3xqhl8g9 on our Discord server for the draft show review!
source
