mech-interp — AI Digest

21 июн How Transparent is DiffusionGemma? Interpretability Study Closes the Gap to Autoregressive Models Google DeepMind research
18 мая Judge Circuits: Mechanistic Explanation of LLM-as-Judge Format Inconsistency research
11 июн Anatomy of Post-Training: Using Interpretability to Audit and Fix Preference Data research