DanceOPD: On-Policy Generative Field Distillation for Unified Image Generation
ByteDance Seed
DanceOPD treats each image generation capability (text-to-image, local editing, global editing) as a velocity field and distills them into a unified student flow-matching model via on-policy sampling. For each training sample, the student routes to one frozen capability field, queries it at a low-noise on-policy state, and matches the resulting velocity with a local MSE loss. This avoids capability interference. Editing scores improve by up to 21.9% in specific categories while text-to-image metrics are preserved or improved by up to 2.0%. 64 upvotes on HF Daily Papers.
Why it matters
Unifying diverse generative capabilities without catastrophic forgetting is a standing challenge in image generation. DanceOPD's on-policy distillation approach is architecturally clean and shows strong empirical results across all three capability dimensions.
Importance: 2/5
64 upvotes on HF Daily; clean solution to multi-capability distillation in image generation from ByteDance Seed