training-dynamics — AI Digest

9 июн On the Geometry of On-Policy Distillation: A Training Paradigm Distinct from SFT and RLVR Hong Kong University of Science and Technology research
26 июн Dense Supervision Is Not Enough: The Readout Blind Spot in Looped Language Models research