-
OpenAI Previews GPT-5.6 Family: Sol, Terra, and Luna in Government-Gated Limited Release
OpenAI
models-llm
-
VibeThinker-3B Reaches Frontier-Level Reasoning Benchmarks via Curriculum RL
WeiboAI
research
-
Recursive Multi-Agent Systems: agent communication in latent space
Stanford University
research
-
Zyphra Releases ZAYA1-8B: Open Reasoning MoE Model Trained on AMD Hardware
Zyphra
models-llm
-
Google DeepMind's AI Co-Mathematician Reaches 48% on FrontierMath Tier 4
Google DeepMind
research
-
RubricEM: Meta-RL with Rubric-Guided Policy Decomposition Beyond Verifiable Rewards
Google
research
-
SU-01: Gold-Medal-Level Olympiad Reasoning via Curriculum SFT and Two-Stage RL
SU-01 Team
research
-
SOOHAK: Frontier LLMs Solve Hard Math But Fail to Recognize Unsolvable Problems
research
-
Code as Agent Harness: Survey Positions Code as the Substrate for Executable Agent Systems (159 HF upvotes)
Multi-institution (42 authors)
research
-
SkillsVote: Lifecycle Governance of Agent Skills — Collection, Recommendation, Evolution (219 HF upvotes)
Memtensor Research Group / IAAR-Shanghai
research
-
Grok 4.3 Now Available on Amazon Bedrock with 1M-Token Context
xAI
models-llm
-
Qwen-AgentWorld: Language World Models for General Agents at 35B and 397B Scale
Qwen Team, Alibaba
research
-
RoPE Provably Fails at Long Contexts: Locality Bias and Token Consistency Both Break
research
-
DRPO: Rethinking Divergence Regularization in LLM Reinforcement Learning
Tencent Hunyuan
research
-
MaxProof: MiniMax Model Exceeds IMO and USAMO Gold-Medal Thresholds on Formal Math
MiniMax
research
-
Ctx2Skill: Self-Improving Framework for Autonomous Context-Skill Discovery in LLMs
research
-
AI Co-Mathematician: Google DeepMind Achieves 48% on FrontierMath Tier 4
Google DeepMind
research
-
SDAR: Self-Distilled Agentic Reinforcement Learning for Multi-Turn Agents
Zhejiang University / Meituan
research
-
MMSkills: Reusable Multimodal Skills for General Visual Agents (105 HF upvotes)
Shanghai Jiao Tong University
research
-
GrepSeek: Training Search Agents for Direct Corpus Interaction via Shell Commands (93 HF Upvotes)
University of Massachusetts Amherst
research
-
ThoughtFold: Introspective Preference Learning Cuts Reasoning Tokens by 56% Without Accuracy Loss
research
-
The Deterministic Horizon: Information-Theoretic Proof That Extended CoT Fails and Tool Use Is Necessary
research
-
The Self-Correction Illusion: LLMs Fix Others' Errors but Not Their Own — Role Labels Are the Cause
research
-
GitHub Copilot Gets 1M Token Context Window and Configurable Reasoning Levels
GitHub / Microsoft
tools
-
Agentic Transformers Provably Learn Depth-First Search via Reinforcement Learning
Carnegie Mellon University / Ohio State University
research
-
Arbor: Generalist Autonomous ML Research via Hypothesis-Tree Refinement
NLPIR Lab
research
-
DeNovoSWE: Full Repository Generation Jumps from 5.8% to 47.2% with Synthetic Training Data
AweAI Team
research
-
Z-Reward: Score Distributions Instead of Scalar Rewards for Image Generation RLHF
Alibaba
research
-
Quantized Reasoning Models Think They Need to Think Longer, but They Do Not
Meta
research
-
The Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes Necessary
research
-
ESamp: LLMs explore by latent distilling for semantic-novelty sampling
ShanghaiTech University
research
-
Odysseus: Training VLMs for 100+ Turn Interactive Decision-Making via RL
Princeton University
research
-
Soohak: 64 Mathematicians Build Research-Level Benchmark That Stumps Frontier LLMs
Seoul National University
research
-
AutoTTS: LLM Agents Automatically Discover Test-Time Scaling Strategies for $40
research
-
TrOPD: Trust-Region On-Policy Distillation Stabilizes LLM Training When Teacher-Student Gap Is Large
Samsung Research
research
-
Do Language Models Need Sleep? Offline Recurrence as Memory Consolidation for Improved Inference
Google / CMU
research
-
InterleaveThinker: RL Framework for Agentic Text-and-Image Interleaved Generation
research
-
Astra: RL-Trained VLM Queries World Simulator for Spatial Reasoning
research
-
HeavySkill: Internalizing Heavy Thinking as a Trainable Agentic Skill via RL
research
-
LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents
Shanghai Jiao Tong University
research
-
Executable World Models for ARC-AGI-3: Coding-Agent Approach Without Game-Specific Logic
research
-
NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized AI Research Automation
Shanghai AI Lab
research
-
TMAS: Scaling Test-Time Compute via Multi-Agent Synergy with Hierarchical Memory
research
-
Learning, Fast and Slow: Dual-Weight Architecture for Continual LLM Adaptation
research
-
BetaPRM: Uncertainty-Aware Process Rewards Cut Reasoning Token Use by 33%
research
-
NudgeRL: Strategy-Level Context Nudges for Efficient RLVR Exploration
KAIST AI
research
-
QUBRIC: Co-Designing Queries and Rubrics Extends RLVR to Open-Ended Reasoning Domains
research
-
Quantifying Faithful Confidence Expression in Large Reasoning Models
Yale NLP
research
-
SubtleMemory: Benchmark Reveals Agents Systematically Fail Fine-Grained Relational Memory
research
-
VideoKR: 315K-Example Training Corpus for Knowledge- and Reasoning-Intensive Video Understanding
Yale University
research
-
Weak Critics Make Strong Learners: On-Policy Critique Distillation for Scalable Oversight
Rutgers University
research
-
SearchSwarm: Delegation Intelligence for LLM Agents in Long-Horizon Deep Research
research
-
Memory is Reconstructed, Not Retrieved: Graph Memory Improves LLM Agent Recall by 23%
National University of Singapore
research
-
ZPPO: Teacher-in-Prompts Knowledge Distillation Outperforms Gradient Methods for Small Reasoners
NVIDIA
research
-
Diffusion-Proof: Formal Theorem Proving via Diffusion Language Models
research
-
DreamReasoner-8B: Block-Size Curriculum for Diffusion Reasoning Models
research
-
S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence in VLMs
Nanyang Technological University
research
-
Agentic Transformers Provably Learn to Search via Reinforcement Learning
research
-
Dense Supervision Is Not Enough: The Readout Blind Spot in Looped Language Models
research
-
OPRD: On-Policy Representation Distillation for Post-Training LLMs
research