Exploring the Design Space of Reward Backpropagation for Flow Matching
Researchers have introduced FlowBP, a new framework designed to improve the alignment of text-to-image models with human preferences. This method addresses limitations in direct reward backpropagation, such as memory constraints and gradient inflation, by creating a surrogate backward trajectory. FlowBP offers three variants that bound memory usage and limit gradient chaining, showing improvements across various metrics on models like SD3.5-M and FLUX. AI
IMPACT Introduces a novel framework to improve the efficiency and effectiveness of aligning generative models with human preferences.