In this video I am testing the Gemma 4 12B model from Google and comparing the speed and quality of 8bit vs 4bit quantizations. The conclusion is not what I expected.
The model is running on a local AI PC I have built with 16GB VRAM and 32GB DDR4 RAM.
I run the model through a few tests which are:
1. Performance
2. Memory
3. Coding
4. Agency
If you’re interested in local LLMs, AI and homelabs from the perspective of a software engineer with many years of professional experience working with LLMs in production – feel free to subscribe!
Model: https://huggingface.co/unsloth/gemma-4-12b-it-GGUF
Config, prompts and overviews: https://github.com/lukesdevlab/youtube
Patreon: https://www.patreon.com/cw/LukesDevLab
#localllm #localai #homelab #llamacpp #homelab #gemma4 #quantization
Chapters:
0:00 Intro
0:33 Model
0:49 System Specs
1:00 Tests Overview
1:25 Performance
2:30 Memory
3:44 Coding
10:44 Agency
12:48 Conclusion
source




Leave a Reply