speculative-decoding — AI Digest

26 июн JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting Hao AI Lab, UC San Diego research
6 мая SGLang v0.5.11: Speculative Decoding V2 as Default and Eight New Model Architectures tools
16 мая Orthrus: 7.8x Inference Speedup for Qwen3 via Autoregressive-Diffusion KV Sharing research
18 мая vLLM v0.21.0: Blackwell MLA Backend, HMA KV Offload, Spec Decode for Reasoning Models vLLM Project tools
6 мая Ollama v0.23.1: Gemma 4 MTP Speculative Decoding Delivers 2× Speed on Apple Silicon tools
17 июн llama.cpp June 16 Builds: Eagle3 Speculative Decoding, Vulkan UMA Memory, NVFP4 Fixes tools