Tencent Hunyuan Open-Sources UniRL: Unified RL Post-Training for LLMs and Diffusion Models

Tencent / Hunyuan

Research official 1 src. ~1 min

Tencent's Hunyuan team released UniRL, an open-source framework for unified RL post-training across LLMs, vision-language models, and diffusion/flow-matching models. It implements a single generate-score-advantage-update-sync loop usable across heterogeneous model families. Two algorithms ship with it: Flow-DPPO for diffusion/flow models using trust-region masks based on exact divergence, and DRPO for LLMs with a smoothed advantage-weighted quadratic regularizer.

Why it matters

RL post-training has become the dominant route to frontier model quality. UniRL is one of the first public frameworks to unify this pipeline across text, vision, and image-generation model families in a single codebase.

Importance: 2/5

First public unified RL post-training framework spanning LLMs and diffusion models from Tencent Hunyuan

reinforcement-learning post-training open-source diffusion rlhf framework

Sources

official Tencent-Hunyuan/UniRL | GitHub