#quantization
- Google DeepMind Releases Gemma 4 QAT Checkpoints: Sub-1 GB On-Device E2B Model Google DeepMind models-llm
- ViQ: Text-Aligned Visual Quantized Representations at Any Resolution (ECCV 2026) Tencent Hunyuan research
- LongLive-2.0: NVFP4 Parallel Infrastructure for Long Video Generation (NVIDIA, 1,220 HF upvotes) NVIDIA research
- Quantized Reasoning Models Think They Need to Think Longer, but They Do Not Meta research
- llama.cpp b9603: Qualcomm Adreno OpenCL Kernels for On-Device Inference ggml-org tools