Gemini 3 has completely dominated everyone’s attention over the last week in the AI space, but is the hype warranted? The benchmarks tell us the model is VERY powerful, but is that a real representation of how Gemini 3 will perform for us on real world tasks? If LLM benchmarks can’t be trusted, then what is the solution?
All these questions are answered in this video.
~~~~~~~~~~~~~~~~~~~~~~~~~~
– Join me on Saturday, November 29th at 9:00 AM CST for my next livestream! I’ll be unveiling my new system for remote agentic coding and giving it away to you, but ONLY during the event itself! Go enable notifications for it here:
– If you want to master AI coding assistants and learn how to build systems for reliable and repeatable results, check out the new Agentic Coding Course in Dynamous:
https://dynamous.ai/agentic-coding-course
– Article on Gemini 3 (with benchmarks):
https://blog.google/products/gemini/gemini-3/
– Google’s New Antigravity:
https://antigravity.google/
– Cline Bench:
https://cline.bot/blog/cline-bench-initiative
~~~~~~~~~~~~~~~~~~~~~~~~~~
00:00 – Is Gemini 3 Really That Powerful?
01:39 – The Problem with LLM Benchmarks
04:04 – Example of Better Tools Making LLMs Seem Better
06:02 – Live Demo of Gemini 3 in Antigravity
07:25 – Gemini 3’s ARC-AGI-2 Benchmark Score is Impressive
07:53 – Introducing the Solution to LLM Benchmarks
08:47 – Cline-Bench – Real World LLM Evals for AI Coding
11:30 – Final Thoughts
~~~~~~~~~~~~~~~~~~~~~~~~~~
Join me as I push the limits of what is possible with AI. I’ll be uploading videos weekly – at least every Wednesday at 7:00 PM CDT!
source
