#training-dynamics
- On the Geometry of On-Policy Distillation: A Training Paradigm Distinct from SFT and RLVR Hong Kong University of Science and Technology research
- Dense Supervision Is Not Enough: The Readout Blind Spot in Looped Language Models research