Gemma 4 12B Quant Comparison – q8 vs q4 – 16GB VRAM Local LLM setup



In this video I am testing the Gemma 4 12B model from Google and comparing the speed and quality of 8bit vs 4bit quantizations. The conclusion is not what I expected.

The model is running on a local AI PC I have built with 16GB VRAM and 32GB DDR4 RAM.

I run the model through a few tests which are:
1. Performance
2. Memory
3. Coding
4. Agency

If you’re interested in local LLMs, AI and homelabs from the perspective of a software engineer with many years of professional experience working with LLMs in production – feel free to subscribe!

Model: https://huggingface.co/unsloth/gemma-4-12b-it-GGUF
Config, prompts and overviews: https://github.com/lukesdevlab/youtube
Patreon: https://www.patreon.com/cw/LukesDevLab

#localllm #localai #homelab #llamacpp #homelab #gemma4 #quantization

Chapters:
0:00 Intro
0:33 Model
0:49 System Specs
1:00 Tests Overview
1:25 Performance
2:30 Memory
3:44 Coding
10:44 Agency
12:48 Conclusion

source

Categories:

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts :-