PulseAugur
EN
LIVE 03:55:00

New Drifting Preference Optimization fine-tunes one-step image generators

Researchers have developed Drifting Preference Optimization (DrPO), a new method for fine-tuning one-step text-to-image generative models. This technique allows for efficient preference tuning of deterministic one-step generators, which are desirable for their speed. DrPO synthesizes an update direction from high- and low-scoring image samples, enabling training with various reward functions without requiring differentiability. AI

IMPACT Enables faster and more flexible fine-tuning of one-step image generation models.

RANK_REASON The cluster contains a research paper detailing a new method for generative models.

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

New Drifting Preference Optimization fine-tunes one-step image generators

COVERAGE [3]

  1. arXiv cs.LG TIER_1 English(EN) · Zhou Jiang, Yandong Wen, Zhen Liu ·

    Drifting Preference Optimization for One-Step Generative Models

    arXiv:2606.02521v1 Announce Type: new Abstract: One-step text-to-image generators are attractive for deployment because they generate an image with a single forward pass, but preference finetuning them remains difficult: standard alignment methods often rely on policy likelihoods…

  2. Hugging Face Daily Papers TIER_1 English(EN) ·

    Drifting Preference Optimization for One-Step Generative Models

    One-step text-to-image generators are attractive for deployment because they generate an image with a single forward pass, but preference finetuning them remains difficult: standard alignment methods often rely on policy likelihoods, denoising trajectories, differentiable reward …

  3. arXiv cs.LG TIER_1 English(EN) · Zhen Liu ·

    Drifting Preference Optimization for One-Step Generative Models

    One-step text-to-image generators are attractive for deployment because they generate an image with a single forward pass, but preference finetuning them remains difficult: standard alignment methods often rely on policy likelihoods, denoising trajectories, differentiable reward …