Brief · PulseAugur

RESEARCH · arXiv cs.LG English(EN) · 16h · [2 sources]

Exploring the Design Space of Reward Backpropagation for Flow Matching

Researchers have introduced FlowBP, a new framework designed to improve the alignment of text-to-image models with human preferences. This method addresses limitations in direct reward backpropagation, such as memory constraints and gradient inflation, by creating a surrogate backward trajectory. FlowBP offers three variants that bound memory usage and limit gradient chaining, showing improvements across various metrics on models like SD3.5-M and FLUX. AI

IMPACT Introduces a novel framework to improve the efficiency and effectiveness of aligning generative models with human preferences.

FLUX.1-dev
LeapAlign
FlowBP
SD3.5-M
FLUX.2-Klein-base