reinforcement-learning — AI Digest

24 июн Prime Intellect Releases prime-rl v0.6.0 for Agentic RL on Trillion-Parameter MoE Models Prime Intellect research
24 июн Qwen-AgentWorld: Language World Models for General Agents across Seven Environments Alibaba/Qwen research
24 июн Sakana AI Releases Fugu: Multi-LLM Orchestrator Achieving SoTA on SWE-Bench Pro Sakana AI research
26 июн DeepReinforce Releases Ornith-1.0: Open-Source Coding Models That Learn Their Own RL Scaffolds DeepReinforce tools
14 июн MaxProof: MiniMax Model Exceeds IMO and USAMO Gold-Medal Thresholds on Formal Math MiniMax research
16 июн DreamX-World 1.0: General-Purpose Interactive World Model with 6DoF Camera Control AMAP-ML (Alibaba Maps AI Lab) research
16 июн FastContext: Specialized Exploration Subagent Cuts Coding Agent Token Usage by 60% Microsoft / Shanghai Jiao Tong University research
21 июн Playful Agentic Robot Learning: Self-Directed Play Yields Transferable Robot Skills UC Berkeley research
28 июн Tencent Hunyuan Open-Sources UniRL: Unified RL Post-Training for LLMs and Diffusion Models Tencent / Hunyuan research