#mech-interp 3 items 21 июн How Transparent is DiffusionGemma? Interpretability Study Closes the Gap to Autoregressive Models Google DeepMind research 18 мая Judge Circuits: Mechanistic Explanation of LLM-as-Judge Format Inconsistency research 11 июн Anatomy of Post-Training: Using Interpretability to Audit and Fix Preference Data research