reasoning — AI Digest

28 июн OpenAI Previews GPT-5.6 Family: Sol, Terra, and Luna in Government-Gated Limited Release OpenAI models-llm
17 июн VibeThinker-3B Reaches Frontier-Level Reasoning Benchmarks via Curriculum RL WeiboAI research
30 апр Recursive Multi-Agent Systems: agent communication in latent space Stanford University research
9 мая Zyphra Releases ZAYA1-8B: Open Reasoning MoE Model Trained on AMD Hardware Zyphra models-llm
10 мая Google DeepMind's AI Co-Mathematician Reaches 48% on FrontierMath Tier 4 Google DeepMind research
13 мая RubricEM: Meta-RL with Rubric-Guided Policy Decomposition Beyond Verifiable Rewards Google research
15 мая SU-01: Gold-Medal-Level Olympiad Reasoning via Curriculum SFT and Two-Stage RL SU-01 Team research
18 мая SOOHAK: Frontier LLMs Solve Hard Math But Fail to Recognize Unsolvable Problems research
20 мая Code as Agent Harness: Survey Positions Code as the Substrate for Executable Agent Systems (159 HF upvotes) Multi-institution (42 authors) research
20 мая SkillsVote: Lifecycle Governance of Agent Skills — Collection, Recommendation, Evolution (219 HF upvotes) Memtensor Research Group / IAAR-Shanghai research
18 июн Grok 4.3 Now Available on Amazon Bedrock with 1M-Token Context xAI models-llm
26 июн Qwen-AgentWorld: Language World Models for General Agents at 35B and 397B Scale Qwen Team, Alibaba research
18 мая RoPE Provably Fails at Long Contexts: Locality Bias and Token Consistency Both Break research
10 июн DRPO: Rethinking Divergence Regularization in LLM Reinforcement Learning Tencent Hunyuan research
14 июн MaxProof: MiniMax Model Exceeds IMO and USAMO Gold-Medal Thresholds on Formal Math MiniMax research
6 мая Ctx2Skill: Self-Improving Framework for Autonomous Context-Skill Discovery in LLMs research
8 мая AI Co-Mathematician: Google DeepMind Achieves 48% on FrontierMath Tier 4 Google DeepMind research
16 мая SDAR: Self-Distilled Agentic Reinforcement Learning for Multi-Turn Agents Zhejiang University / Meituan research
19 мая MMSkills: Reusable Multimodal Skills for General Visual Agents (105 HF upvotes) Shanghai Jiao Tong University research
2 июн GrepSeek: Training Search Agents for Direct Corpus Interaction via Shell Commands (93 HF Upvotes) University of Massachusetts Amherst research
4 июн ThoughtFold: Introspective Preference Learning Cuts Reasoning Tokens by 56% Without Accuracy Loss research
6 июн The Deterministic Horizon: Information-Theoretic Proof That Extended CoT Fails and Tool Use Is Necessary research
6 июн The Self-Correction Illusion: LLMs Fix Others' Errors but Not Their Own — Role Labels Are the Cause research
8 июн GitHub Copilot Gets 1M Token Context Window and Configurable Reasoning Levels GitHub / Microsoft tools
8 июн Agentic Transformers Provably Learn Depth-First Search via Reinforcement Learning Carnegie Mellon University / Ohio State University research
11 июн Arbor: Generalist Autonomous ML Research via Hypothesis-Tree Refinement NLPIR Lab research
11 июн DeNovoSWE: Full Repository Generation Jumps from 5.8% to 47.2% with Synthetic Training Data AweAI Team research
11 июн Z-Reward: Score Distributions Instead of Scalar Rewards for Image Generation RLHF Alibaba research
25 июн Quantized Reasoning Models Think They Need to Think Longer, but They Do Not Meta research
26 июн The Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes Necessary research
2 мая ESamp: LLMs explore by latent distilling for semantic-novelty sampling ShanghaiTech University research
5 мая Odysseus: Training VLMs for 100+ Turn Interactive Decision-Making via RL Princeton University research
11 мая Soohak: 64 Mathematicians Build Research-Level Benchmark That Stumps Frontier LLMs Seoul National University research
11 мая AutoTTS: LLM Agents Automatically Discover Test-Time Scaling Strategies for $40 research
3 июн TrOPD: Trust-Region On-Policy Distillation Stabilizes LLM Training When Teacher-Student Gap Is Large Samsung Research research
3 июн Do Language Models Need Sleep? Offline Recurrence as Memory Consolidation for Improved Inference Google / CMU research
12 июн InterleaveThinker: RL Framework for Agentic Text-and-Image Interleaved Generation research
12 июн Astra: RL-Trained VLM Queries World Simulator for Spatial Reasoning research
6 мая HeavySkill: Internalizing Heavy Thinking as a Trainable Agentic Skill via RL research
7 мая LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents Shanghai Jiao Tong University research
7 мая Executable World Models for ARC-AGI-3: Coding-Agent Approach Without Game-Specific Logic research
12 мая NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized AI Research Automation Shanghai AI Lab research
12 мая TMAS: Scaling Test-Time Compute via Multi-Agent Synergy with Hierarchical Memory research
13 мая Learning, Fast and Slow: Dual-Weight Architecture for Continual LLM Adaptation research
18 мая BetaPRM: Uncertainty-Aware Process Rewards Cut Reasoning Token Use by 33% research
19 мая NudgeRL: Strategy-Level Context Nudges for Efficient RLVR Exploration KAIST AI research
3 июн QUBRIC: Co-Designing Queries and Rubrics Extends RLVR to Open-Ended Reasoning Domains research
3 июн Quantifying Faithful Confidence Expression in Large Reasoning Models Yale NLP research
8 июн SubtleMemory: Benchmark Reveals Agents Systematically Fail Fine-Grained Relational Memory research
8 июн VideoKR: 315K-Example Training Corpus for Knowledge- and Reasoning-Intensive Video Understanding Yale University research
9 июн Weak Critics Make Strong Learners: On-Policy Critique Distillation for Scalable Oversight Rutgers University research
10 июн SearchSwarm: Delegation Intelligence for LLM Agents in Long-Horizon Deep Research research
16 июн Memory is Reconstructed, Not Retrieved: Graph Memory Improves LLM Agent Recall by 23% National University of Singapore research
17 июн ZPPO: Teacher-in-Prompts Knowledge Distillation Outperforms Gradient Methods for Small Reasoners NVIDIA research
18 июн Diffusion-Proof: Formal Theorem Proving via Diffusion Language Models research
18 июн DreamReasoner-8B: Block-Size Curriculum for Diffusion Reasoning Models research
22 июн S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence in VLMs Nanyang Technological University research
23 июн Agentic Transformers Provably Learn to Search via Reinforcement Learning research
26 июн Dense Supervision Is Not Enough: The Readout Blind Spot in Looped Language Models research
26 июн OPRD: On-Policy Representation Distillation for Post-Training LLMs research