#post-training
- DRPO: Rethinking Divergence Regularization in LLM Reinforcement Learning Tencent Hunyuan research
- Anatomy of Post-Training: Using Interpretability to Audit and Fix Preference Data research
- OPRD: On-Policy Representation Distillation for Post-Training LLMs research
- Tencent Hunyuan Open-Sources UniRL: Unified RL Post-Training for LLMs and Diffusion Models Tencent / Hunyuan research