AI
AI Digest
EN RU
Home Archive About RSS

#reward-hacking

4 items

  • 3 мая Exploration Hacking: LLMs Can Be Fine-Tuned to Strategically Resist RL Training research
  • 3 мая OpenAI Discloses How a 2.5%-User Reward Signal Gave GPT a Goblin Obsession Across Model Generations OpenAI research
  • 6 мая OpenAI Post-Mortem: How RLHF Reward Hacking Embedded Goblin Metaphors in GPT-5.x OpenAI research
  • 28 июн The Verification Horizon: No Single Reward Function Works for Coding Agents at Scale Qwen (Alibaba) research

ai-digest.kerby.pro

© 2026 Alexei Lukin · CC BY 4.0

RSS · JSON Feed · About