Tencent Hunyuan Open-Sources UniRL: Unified RL Post-Training for LLMs and Diffusion Models
Tencent / Hunyuan
Tencent's Hunyuan team released UniRL, an open-source framework for unified RL post-training across LLMs, vision-language models, and diffusion/flow-matching models. It implements a single generate-score-advantage-update-sync loop usable across heterogeneous model families. Two algorithms ship with it: Flow-DPPO for diffusion/flow models using trust-region masks based on exact divergence, and DRPO for LLMs with a smoothed advantage-weighted quadratic regularizer.
Why it matters
RL post-training has become the dominant route to frontier model quality. UniRL is one of the first public frameworks to unify this pipeline across text, vision, and image-generation model families in a single codebase.
Importance: 2/5
First public unified RL post-training framework spanning LLMs and diffusion models from Tencent Hunyuan