#reinforcement-learning
- Prime Intellect Releases prime-rl v0.6.0 for Agentic RL on Trillion-Parameter MoE Models Prime Intellect research
- Qwen-AgentWorld: Language World Models for General Agents across Seven Environments Alibaba/Qwen research
- Sakana AI Releases Fugu: Multi-LLM Orchestrator Achieving SoTA on SWE-Bench Pro Sakana AI research
- DeepReinforce Releases Ornith-1.0: Open-Source Coding Models That Learn Their Own RL Scaffolds DeepReinforce tools
- MaxProof: MiniMax Model Exceeds IMO and USAMO Gold-Medal Thresholds on Formal Math MiniMax research
- DreamX-World 1.0: General-Purpose Interactive World Model with 6DoF Camera Control AMAP-ML (Alibaba Maps AI Lab) research
- FastContext: Specialized Exploration Subagent Cuts Coding Agent Token Usage by 60% Microsoft / Shanghai Jiao Tong University research
- Playful Agentic Robot Learning: Self-Directed Play Yields Transferable Robot Skills UC Berkeley research
- Tencent Hunyuan Open-Sources UniRL: Unified RL Post-Training for LLMs and Diffusion Models Tencent / Hunyuan research