Gemini 3 (FINAL Checkpoints Tested): I TESTED EVERY CHECKPOINT of Gemini-3. It’s dropping this month



In this video, I’ll be recapping my hands-on tests with Gemini 3 checkpoints and what the recent Vertex AI listing might mean for the launch. I cover the 2HT, ECPT, and X28 checkpoints, where each one shines or stumbles, pricing expectations, tool-calling, and how I’m planning the public-preview benchmark the moment it drops.


Key Takeaways:

🔎 Vertex AI briefly listed “Gemini 3.0 Pro (preview)” with 11-2025, suggesting an imminent launch.
🧪 Recap of checkpoints: 2HT was excellent, ECPT felt nerfed, and X28 has been the strongest so far.
🧠 Strong reasoning with a slow first token on the best checkpoints hints at a “thinking” variant.
🎯 Consistency is notably higher than many peers; repeated runs produce similar, coherent outputs.
🧩 Big wins in one-shot code for 3D/Three.js, clean UI, SVG/Blender; Minecraft and Pokéball improved; butterfly sim strong but sometimes clips.
🛠️ Tool-calling looks promising in early tests (Roo Human Relay) but needs reliable multi-step chaining; likely trained for Gemini CLI/Jules patterns.
💸 Pricing expectations: Sonnet-level or lower would make it a clear value; above Sonnet needs stronger tool-call reliability and throughput to justify.
⚖️ Benchmarking advice: don’t judge by “WebOS” demos; push math, 3D, and multi-file flows; test regeneration stability and latency to first token.
🚢 Likely rollout: Pro preview first, Flash near it; “Ultra” label uncertain, but some checkpoints feel Ultra-tier.
📈 I’ll publish full public-preview benchmarks: token economics, latency, tool-call pass rates, and stability across sessions.

source

Similar Posts