-
NVIDIA Releases Cosmos 3: Open Omnimodal World Foundation Model for Physical AI
NVIDIA
research
-
GLM-5V-Turbo: a natively multimodal foundation model for agents
Z.ai
research
-
SenseNova-U1: Open-Source Unified Multimodal Understanding and Generation via NEO-unify
SenseTime
research
-
Recursive Multi-Agent Systems: agent communication in latent space
Stanford University
research
-
Eywa: heterogeneous collaboration framework between LLM agents and scientific foundation models
University of Illinois at Urbana-Champaign
research
-
Exploration Hacking: LLMs Can Be Fine-Tuned to Strategically Resist RL Training
research
-
OpenAI Discloses How a 2.5%-User Reward Signal Gave GPT a Goblin Obsession Across Model Generations
OpenAI
research
-
MiniCPM-o 4.5: Real-Time Full-Duplex Omni-Modal AI on Edge Devices
OpenBMB / Tsinghua University
research
-
AI2 Open-Sources MolmoAct2: Robotics VLA That Claims to Beat GPT-5 on Embodied Reasoning
AI2
research
-
UniVidX: One Diffusion Backbone for RGB, Intrinsic Maps, and RGBA Video Generation
research
-
OpenAI Post-Mortem: How RLHF Reward Hacking Embedded Goblin Metaphors in GPT-5.x
OpenAI
research
-
RubricEM: Meta-RL with Rubric-Guided Policy Decomposition Beyond Verifiable Rewards
Google
research
-
Asymmetric Flow Models: SOTA 1.57 FID on ImageNet via Rank-Asymmetric Velocity Parameterization
Stanford University
research
-
Humanoid-GPT: Scaling to 2B Motion Frames Enables Zero-Shot Generalization in Humanoid Control
research
-
Beyond NL2Code: A Structured Survey of Multimodal Code Intelligence
research
-
JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting
Hao AI Lab, UC San Diego
research
-
Qwen-AgentWorld: Language World Models for General Agents at 35B and 397B Scale
Qwen Team, Alibaba
research
-
MLEvolve: Self-Evolving Multi-Agent LLM Framework for Automated ML Algorithm Discovery
research
-
MiniMax Sparse Attention: 28× Compute Reduction at 1M-Token Context with No Quality Loss
MiniMax
research
-
MaxProof: MiniMax Model Exceeds IMO and USAMO Gold-Medal Thresholds on Formal Math
MiniMax
research
-
Learning while Deploying: Fleet-Scale Reinforcement Learning Turns Robot Deployment into Continuous Training
AGIBot
research
-
Ctx2Skill: Self-Improving Framework for Autonomous Context-Skill Discovery in LLMs
research
-
RLDX-1: Multi-Stream Action Transformer Achieves 86.8% on ALLEX Humanoid Tasks
RLWRLD
research
-
AI Co-Mathematician: Google DeepMind Achieves 48% on FrontierMath Tier 4
Google DeepMind
research
-
OpenSearch-VL: Open Recipe for Training Frontier Multimodal Search Agents
Tencent Hunyuan
research
-
ARIS: Autonomous ML Research via Adversarial Multi-Agent Collaboration
Shanghai Jiao Tong University
research
-
Crafter: Multi-Agent Harness for Editable Scientific Figure Generation Scores +16pt Over Baselines (103 HF Upvotes)
Tsinghua University
research
-
GrepSeek: Training Search Agents for Direct Corpus Interaction via Shell Commands (93 HF Upvotes)
University of Massachusetts Amherst
research
-
Echo-Infinity: Real-Time Infinite Video Generation via Learnable Memory Query
research
-
ThoughtFold: Introspective Preference Learning Cuts Reasoning Tokens by 56% Without Accuracy Loss
research
-
The Deterministic Horizon: Information-Theoretic Proof That Extended CoT Fails and Tool Use Is Necessary
research
-
The Self-Correction Illusion: LLMs Fix Others' Errors but Not Their Own — Role Labels Are the Cause
research
-
Audio Interaction Model: Unified Streaming Framework Combining Offline and Real-Time Audio Instruction Following
research
-
Agentic Transformers Provably Learn Depth-First Search via Reinforcement Learning
Carnegie Mellon University / Ohio State University
research
-
EvoArena: LLM Agents Score Only 40% on Dynamic Evolving Environments
MIT / NUS / Salesforce
research
-
WeaveBench: Computer-Use Agents Fail at Hybrid GUI+CLI Tasks — 41% Pass Rate
Microsoft Research
research
-
InterleaveThinker: RL Planner+Critic Pipeline for Interleaved Text-and-Image Generation
CUHK Multimedia Lab
research
-
DreamX-World 1.0: General-Purpose Interactive World Model with 6DoF Camera Control
AMAP-ML (Alibaba Maps AI Lab)
research
-
FastContext: Specialized Exploration Subagent Cuts Coding Agent Token Usage by 60%
Microsoft / Shanghai Jiao Tong University
research
-
SAE Interventions Are Unreliable: Suppressed Behaviors Recover Post-Intervention
Hong Kong Polytechnic University
research
-
Quantized Reasoning Models Think They Need to Think Longer, but They Do Not
Meta
research
-
The Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes Necessary
research
-
TIDE: cross-architecture distillation for diffusion LLMs
Peking University
research
-
Programming with Data: test-driven data engineering for self-improving LLMs
OpenDataLab
research
-
ESamp: LLMs explore by latent distilling for semantic-novelty sampling
ShanghaiTech University
research
-
CoPD: co-evolving policy distillation for unified multi-capability models
research
-
Odysseus: Training VLMs for 100+ Turn Interactive Decision-Making via RL
Princeton University
research
-
Meta Publishes Preparedness Report for Code World Model Before Open-Weight Release
Meta
research
-
World Action Models: First Systematic Survey of Embodied Foundation Models Unifying World Modeling and Action
OpenMOSS
research
-
AnyFlow: Any-Step Video Diffusion with On-Policy Flow Map Distillation
MIT / NVIDIA
research
-
TrOPD: Trust-Region On-Policy Distillation Stabilizes LLM Training When Teacher-Student Gap Is Large
Samsung Research
research
-
Do Language Models Need Sleep? Offline Recurrence as Memory Consolidation for Improved Inference
Google / CMU
research
-
InterleaveThinker: RL Framework for Agentic Text-and-Image Interleaved Generation
research
-
EvoArena: LLM Agents Score Only 39.6% on Dynamic Evolving Environments Benchmark
MIT
research
-
FORT-Searcher: Shortcut-Resistant Training Data Framework for Deep Search Agents
research
-
Astra: RL-Trained VLM Queries World Simulator for Spatial Reasoning
research
-
Are We Ready For an Agent-Native Memory System? SJTU Benchmarks 12 Architectures
research
-
Wan-Streamer v0.1: End-to-End Real-Time Interactive Foundation Model Under 550ms Latency
Wan-AI
research
-
DomainShuttle: Subject-Driven Text-to-Video Across In-Domain and Cross-Domain Scenarios
research
-
Intern-Atlas: 1M-Paper Methodology Evolution Graph as Research Infrastructure for AI Scientists
research
-
HeavySkill: Internalizing Heavy Thinking as a Trainable Agentic Skill via RL
research
-
LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents
Shanghai Jiao Tong University
research
-
Executable World Models for ARC-AGI-3: Coding-Agent Approach Without Game-Specific Logic
research
-
Structural Origin of Attention Sink: Variance Discrepancy, Super Neurons, and a Fix
research
-
Direct Corpus Interaction: Rethinking Retrieval for Agentic Search
TIGER-Lab
research
-
Cola DLM: Continuous Latent Diffusion Language Model with Competitive Scaling
research
-
Learning, Fast and Slow: Dual-Weight Architecture for Continual LLM Adaptation
research
-
QUBRIC: Co-Designing Queries and Rubrics Extends RLVR to Open-Ended Reasoning Domains
research
-
Quantifying Faithful Confidence Expression in Large Reasoning Models
Yale NLP
research
-
SubtleMemory: Benchmark Reveals Agents Systematically Fail Fine-Grained Relational Memory
research
-
Code2LoRA: Hypernetwork Generates Repo-Specific Adapters for Code LMs with Zero Inference Overhead
University of Waterloo
research
-
VideoKR: 315K-Example Training Corpus for Knowledge- and Reasoning-Intensive Video Understanding
Yale University
research
-
Memory is Reconstructed, Not Retrieved: Graph Memory Improves LLM Agent Recall by 23%
National University of Singapore
research
-
Diffusion-Proof: Formal Theorem Proving via Diffusion Language Models
research
-
DreamReasoner-8B: Block-Size Curriculum for Diffusion Reasoning Models
research
-
StylisticBias: 15 Visual Attributes Account for 80% of Social Bias in Multimodal LLMs
research
-
Multimodal Evaluator Preference Collapse: Cross-Modal Contagion in Self-Evolving Agent Loops
research
-
Dense Supervision Is Not Enough: The Readout Blind Spot in Looped Language Models
research
-
OPRD: On-Policy Representation Distillation for Post-Training LLMs
research
-
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond
HKUST/NUS/Oxford/NTU
research
-
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
Microsoft Research
research
-
LLM Safety From Within (SIREN)
University of Toronto CSSLab / McGill / LMU Munich
research