paper — AI Digest

4 июн NVIDIA Releases Cosmos 3: Open Omnimodal World Foundation Model for Physical AI NVIDIA research

30 апр GLM-5V-Turbo: a natively multimodal foundation model for agents Z.ai research

13 мая SenseNova-U1: Open-Source Unified Multimodal Understanding and Generation via NEO-unify SenseTime research

30 апр Recursive Multi-Agent Systems: agent communication in latent space Stanford University research

2 мая Eywa: heterogeneous collaboration framework between LLM agents and scientific foundation models University of Illinois at Urbana-Champaign research

3 мая Exploration Hacking: LLMs Can Be Fine-Tuned to Strategically Resist RL Training research

3 мая OpenAI Discloses How a 2.5%-User Reward Signal Gave GPT a Goblin Obsession Across Model Generations OpenAI research

3 мая MiniCPM-o 4.5: Real-Time Full-Duplex Omni-Modal AI on Edge Devices OpenBMB / Tsinghua University research

5 мая AI2 Open-Sources MolmoAct2: Robotics VLA That Claims to Beat GPT-5 on Embodied Reasoning AI2 research

5 мая UniVidX: One Diffusion Backbone for RGB, Intrinsic Maps, and RGBA Video Generation research

6 мая OpenAI Post-Mortem: How RLHF Reward Hacking Embedded Goblin Metaphors in GPT-5.x OpenAI research

13 мая RubricEM: Meta-RL with Rubric-Guided Policy Decomposition Beyond Verifiable Rewards Google research

14 мая Asymmetric Flow Models: SOTA 1.57 FID on ImageNet via Rank-Asymmetric Velocity Parameterization Stanford University research

3 июн Humanoid-GPT: Scaling to 2B Motion Frames Enables Zero-Shot Generalization in Humanoid Control research

25 июн Beyond NL2Code: A Structured Survey of Multimodal Code Intelligence research

26 июн JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting Hao AI Lab, UC San Diego research

26 июн Qwen-AgentWorld: Language World Models for General Agents at 35B and 397B Scale Qwen Team, Alibaba research

6 июн MLEvolve: Self-Evolving Multi-Agent LLM Framework for Automated ML Algorithm Discovery research

14 июн MiniMax Sparse Attention: 28× Compute Reduction at 1M-Token Context with No Quality Loss MiniMax research

14 июн MaxProof: MiniMax Model Exceeds IMO and USAMO Gold-Medal Thresholds on Formal Math MiniMax research

4 мая Learning while Deploying: Fleet-Scale Reinforcement Learning Turns Robot Deployment into Continuous Training AGIBot research

6 мая Ctx2Skill: Self-Improving Framework for Autonomous Context-Skill Discovery in LLMs research

7 мая RLDX-1: Multi-Stream Action Transformer Achieves 86.8% on ALLEX Humanoid Tasks RLWRLD research

8 мая AI Co-Mathematician: Google DeepMind Achieves 48% on FrontierMath Tier 4 Google DeepMind research

9 мая OpenSearch-VL: Open Recipe for Training Frontier Multimodal Search Agents Tencent Hunyuan research

9 мая ARIS: Autonomous ML Research via Adversarial Multi-Agent Collaboration Shanghai Jiao Tong University research

2 июн Crafter: Multi-Agent Harness for Editable Scientific Figure Generation Scores +16pt Over Baselines (103 HF Upvotes) Tsinghua University research

2 июн GrepSeek: Training Search Agents for Direct Corpus Interaction via Shell Commands (93 HF Upvotes) University of Massachusetts Amherst research

4 июн Echo-Infinity: Real-Time Infinite Video Generation via Learnable Memory Query research

4 июн ThoughtFold: Introspective Preference Learning Cuts Reasoning Tokens by 56% Without Accuracy Loss research

6 июн The Deterministic Horizon: Information-Theoretic Proof That Extended CoT Fails and Tool Use Is Necessary research

6 июн The Self-Correction Illusion: LLMs Fix Others' Errors but Not Their Own — Role Labels Are the Cause research

6 июн Audio Interaction Model: Unified Streaming Framework Combining Offline and Real-Time Audio Instruction Following research

8 июн Agentic Transformers Provably Learn Depth-First Search via Reinforcement Learning Carnegie Mellon University / Ohio State University research

14 июн EvoArena: LLM Agents Score Only 40% on Dynamic Evolving Environments MIT / NUS / Salesforce research

14 июн WeaveBench: Computer-Use Agents Fail at Hybrid GUI+CLI Tasks — 41% Pass Rate Microsoft Research research

14 июн InterleaveThinker: RL Planner+Critic Pipeline for Interleaved Text-and-Image Generation CUHK Multimedia Lab research

16 июн DreamX-World 1.0: General-Purpose Interactive World Model with 6DoF Camera Control AMAP-ML (Alibaba Maps AI Lab) research

16 июн FastContext: Specialized Exploration Subagent Cuts Coding Agent Token Usage by 60% Microsoft / Shanghai Jiao Tong University research

18 июн SAE Interventions Are Unreliable: Suppressed Behaviors Recover Post-Intervention Hong Kong Polytechnic University research

25 июн Quantized Reasoning Models Think They Need to Think Longer, but They Do Not Meta research

26 июн The Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes Necessary research

30 апр TIDE: cross-architecture distillation for diffusion LLMs Peking University research

30 апр Programming with Data: test-driven data engineering for self-improving LLMs OpenDataLab research

2 мая ESamp: LLMs explore by latent distilling for semantic-novelty sampling ShanghaiTech University research

2 мая CoPD: co-evolving policy distillation for unified multi-capability models research

5 мая Odysseus: Training VLMs for 100+ Turn Interactive Decision-Making via RL Princeton University research

5 мая Meta Publishes Preparedness Report for Code World Model Before Open-Weight Release Meta research

13 мая World Action Models: First Systematic Survey of Embodied Foundation Models Unifying World Modeling and Action OpenMOSS research

14 мая AnyFlow: Any-Step Video Diffusion with On-Policy Flow Map Distillation MIT / NVIDIA research

3 июн TrOPD: Trust-Region On-Policy Distillation Stabilizes LLM Training When Teacher-Student Gap Is Large Samsung Research research

3 июн Do Language Models Need Sleep? Offline Recurrence as Memory Consolidation for Improved Inference Google / CMU research

12 июн InterleaveThinker: RL Framework for Agentic Text-and-Image Interleaved Generation research

12 июн EvoArena: LLM Agents Score Only 39.6% on Dynamic Evolving Environments Benchmark MIT research

12 июн FORT-Searcher: Shortcut-Resistant Training Data Framework for Deep Search Agents research

12 июн Astra: RL-Trained VLM Queries World Simulator for Spatial Reasoning research

25 июн Are We Ready For an Agent-Native Memory System? SJTU Benchmarks 12 Architectures research

25 июн Wan-Streamer v0.1: End-to-End Real-Time Interactive Foundation Model Under 550ms Latency Wan-AI research

25 июн DomainShuttle: Subject-Driven Text-to-Video Across In-Domain and Cross-Domain Scenarios research

4 мая Intern-Atlas: 1M-Paper Methodology Evolution Graph as Research Infrastructure for AI Scientists research

6 мая HeavySkill: Internalizing Heavy Thinking as a Trainable Agentic Skill via RL research

7 мая LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents Shanghai Jiao Tong University research

7 мая Executable World Models for ARC-AGI-3: Coding-Agent Approach Without Game-Specific Logic research

8 мая Structural Origin of Attention Sink: Variance Discrepancy, Super Neurons, and a Fix research

9 мая Direct Corpus Interaction: Rethinking Retrieval for Agentic Search TIGER-Lab research

9 мая Cola DLM: Continuous Latent Diffusion Language Model with Competitive Scaling research

13 мая Learning, Fast and Slow: Dual-Weight Architecture for Continual LLM Adaptation research

3 июн QUBRIC: Co-Designing Queries and Rubrics Extends RLVR to Open-Ended Reasoning Domains research

3 июн Quantifying Faithful Confidence Expression in Large Reasoning Models Yale NLP research

8 июн SubtleMemory: Benchmark Reveals Agents Systematically Fail Fine-Grained Relational Memory research

8 июн Code2LoRA: Hypernetwork Generates Repo-Specific Adapters for Code LMs with Zero Inference Overhead University of Waterloo research

8 июн VideoKR: 315K-Example Training Corpus for Knowledge- and Reasoning-Intensive Video Understanding Yale University research

16 июн Memory is Reconstructed, Not Retrieved: Graph Memory Improves LLM Agent Recall by 23% National University of Singapore research

18 июн Diffusion-Proof: Formal Theorem Proving via Diffusion Language Models research

18 июн DreamReasoner-8B: Block-Size Curriculum for Diffusion Reasoning Models research

19 июн StylisticBias: 15 Visual Attributes Account for 80% of Social Bias in Multimodal LLMs research

19 июн Multimodal Evaluator Preference Collapse: Cross-Modal Contagion in Self-Evolving Agent Loops research

26 июн Dense Supervision Is Not Enough: The Readout Blind Spot in Looped Language Models research

26 июн OPRD: On-Policy Representation Distillation for Post-Training LLMs research

28 апр Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond HKUST/NUS/Oxford/NTU research

28 апр World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Microsoft Research research

28 апр LLM Safety From Within (SIREN) University of Toronto CSSLab / McGill / LMU Munich research