I Tested the First Diffusion Reasoning LLM… It’s Insanely Fast
You can try Mercury 2 here:
M2 Playground: https://chat.inceptionlabs.ai/
M2 API: http://platform.inceptionlabs.ai/
Inception gave me early access and I made this video in partnership with them.
I walk through how I test a new reasoning LLM called Mercury 2 and show why it’s so fast.
I ask it to build a full game of checkers, and it writes working code almost instantly. Then I push it to create a full game of chess, and it generates hundreds of lines of code in seconds. I also show how it handles follow-up prompts and rewrites the code just as fast.
Mercury 2 is different from other large language models like ChatGPT and Claude Haiku. Most LLMs generate tokens one at a time. Mercury 2 uses a diffusion model, which creates and refines tokens in parallel. That’s why this diffusion LLM can reach around 1,000 tokens per second.
I compare Mercury 2 to Claude Haiku in a speed test, and Mercury 2 finishes much faster while still keeping strong reasoning. This makes it a great fit for AI agents, coding, voice apps, search tools, and customer service apps where you need both speed and reasoning.
If you’re building with an API and need a fast reasoning LLM, Mercury 2 is worth testing in the playground or API.
source
