#efficiency
- Qwen-Image-2.0: Unified Image Generation and Editing at 2K Resolution, Top-1 on AI Arena Alibaba research
- Baidu Releases ERNIE 5.1 at 6% of Industry Pre-Training Cost, Enters Global Top-10 Search Baidu models-llm
- JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting Hao AI Lab, UC San Diego research
- Kwai Keye-VL-2.0: Open-Source 30B MoE Multimodal Model with 256K Context for Long Video Kwai research
- MiniMax Sparse Attention: 28× Compute Reduction at 1M-Token Context with No Quality Loss MiniMax research
- Moebius: 0.2B Lightweight Image Inpainting Framework Matches 11.9B FLUX Model Huazhong University of Science and Technology research
- Orthrus: 7.8x Inference Speedup for Qwen3 via Autoregressive-Diffusion KV Sharing research
- SANA-WM: Minute-Scale 720p World Modeling on a Single GPU NVIDIA research
- ThoughtFold: Introspective Preference Learning Cuts Reasoning Tokens by 56% Without Accuracy Loss research
- FastContext: Specialized Exploration Subagent Cuts Coding Agent Token Usage by 60% Microsoft / Shanghai Jiao Tong University research
- Quantized Reasoning Models Think They Need to Think Longer, but They Do Not Meta research
- Are We Ready For an Agent-Native Memory System? SJTU Benchmarks 12 Architectures research
- On the Geometry of On-Policy Distillation: A Training Paradigm Distinct from SFT and RLVR Hong Kong University of Science and Technology research
- SHERLOC: Structured Diagnostic Localization Cuts Code Repair Token Usage by 36.7% research
- OPRD: On-Policy Representation Distillation for Post-Training LLMs research