#speculative-decoding
- JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting Hao AI Lab, UC San Diego research
- SGLang v0.5.11: Speculative Decoding V2 as Default and Eight New Model Architectures tools
- Orthrus: 7.8x Inference Speedup for Qwen3 via Autoregressive-Diffusion KV Sharing research
- vLLM v0.21.0: Blackwell MLA Backend, HMA KV Offload, Spec Decode for Reasoning Models vLLM Project tools
- Ollama v0.23.1: Gemma 4 MTP Speculative Decoding Delivers 2× Speed on Apple Silicon tools
- llama.cpp June 16 Builds: Eagle3 Speculative Decoding, Vulkan UMA Memory, NVFP4 Fixes tools