Text diffusion: A new paradigm for LLMs
Text diffusion is a new paradigm for LLMs. As opposed to mainstream auto-regressive models like GPT, Claude or Gemini (which predict one token at a time), diffusion-based LLMs draft an entire response and refine it progressively. This leads to 10x faster inference.
Models like Gemini Diffusion, Mercury Coder from Inception Labs and Seed Diffusion from ByteDance are already competitive on coding benchmarks.
Inspired by physical diffusion, such models make use of Markov chains to model data generation as a particle hopping through discrete states. We’ll walk through the D3PM and LLaDA papers as case studies.
📖 Papers:
Full reading list: https://www.patreon.com/posts/papers-diffusion-140452266
D3PM: https://arxiv.org/abs/2107.03006
LLaDA: https://arxiv.org/abs/2502.09992
Scaling up Masked Diffusion Models on Text: https://arxiv.org/abs/2410.18514
▶️ The physics behind diffusion models: https://youtu.be/R0uMcXsfo2o?si=OqdGg4TPefSNTK3t
00:00 Intro
01:04 Auto-regressive vs diffusion LLMs
02:06 Why bother with diffusion for text?
06:30 The probability landscape
07:57 Diffusion in latent embedding space
11:00 Diffusion in token embedding space
12:13 Diffusion in text token space
13:49 Markov chains
16:46 Paper study: D3PM
19:42 Paper study: LLaDA
22:30 Evaluation
source
