#scalable-oversight
- Automated Weak-to-Strong Researcher: AI Agents Outperform Humans on Alignment Research Anthropic research
- The Verification Horizon: No Single Reward Function Works for Coding Agents at Scale Qwen (Alibaba) research
- Weak Critics Make Strong Learners: On-Policy Critique Distillation for Scalable Oversight Rutgers University research