Brief

last 24h

[5/5] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.LG English(EN) · 5d

Linear-DPO: Linear Direct Preference Optimization for Diffusion and Flow-Matching Generative Models

Researchers have introduced Linear-DPO, a novel method for aligning text-to-image generative models. This approach generalizes the Direct Preference Optimization objective to encompass both diffusion and flow-matching models within a unified framework. By replacing the standard sigmoid-based utility function with a linear one and incorporating an EMA-updated reference model, Linear-DPO demonstrates superior performance over existing methods on diffusion models like SD1.5 and SDXL, as well as the flow-matching model SD3-Medium. AI

IMPACT Introduces a more effective alignment technique for text-to-image models, potentially improving their adherence to user prompts.
RESEARCH · arXiv cs.AI English(EN) · 4d · [2 sources]

Coloring the Noise: Adversarial Sobolev Alignment for Faithful Image Super Resolution

Researchers have developed a new framework called ASASR for image super-resolution that aims to improve the faithfulness of generated images. This method addresses spectral misalignment issues in current generative models by recasting the generative flow into a Sobolev-induced Riemannian geometry. ASASR uses a parametric adversary to synthesize targeted negative samples, guiding optimization to preserve spectral consistency and structural fidelity, thereby reducing artifacts. AI

IMPACT Enhances image restoration fidelity by addressing spectral misalignment in generative models.
TOOL · arXiv cs.IR (Information Retrieval) English(EN) · 4d

TPMM-DPO: Trajectory-aware Preference-guided Model Merging for Iterative Direct Preference Optimization

Researchers have introduced TPMM-DPO, a novel method for aligning large language models that addresses issues of error accumulation in iterative Direct Preference Optimization. This new approach treats the sequence of policy models as an optimization trajectory, adaptively merging them with learned weights to create a more stable and robust reference model. Experiments demonstrate that TPMM-DPO significantly improves generation quality and performance, outperforming standard iterative DPO by mitigating degradation in later training stages. AI

IMPACT Improves LLM alignment stability and performance by mitigating error accumulation in iterative training.
RESEARCH · arXiv cs.AI English(EN) · 3w · [6 sources]

TUR-DPO: Topology- and Uncertainty-Aware Direct Preference Optimization

Researchers are exploring advanced methods for aligning large language models with human preferences, moving beyond traditional Reinforcement Learning from Human Feedback (RLHF). New approaches like Direct Preference Optimization (DPO) offer simpler implementations but have theoretical limitations. Papers introduce refinements such as Constrained Preference Optimization (CPO) and Topology- and Uncertainty-Aware DPO (TUR-DPO) to address these shortcomings and improve alignment guarantees. AI

IMPACT New alignment techniques like CPO and TUR-DPO offer improved theoretical guarantees and empirical performance for LLMs.
TOOL · Together AI blog English(EN) · 13mo

Together Fine-Tuning Platform, Now With Preference Optimization and Continued Training

Together AI has launched a new fine-tuning platform that allows users to continuously improve open-weight language models. The platform now supports preference optimization and continued training, enabling models to adapt based on user feedback and new data. A new web UI simplifies the process, allowing developers to manage datasets, specify parameters, and monitor experiments directly from their browser. AI

IMPACT Enables easier and more continuous adaptation of open-weight models for specific applications.

Brief

Linear-DPO: Linear Direct Preference Optimization for Diffusion and Flow-Matching Generative Models

Coloring the Noise: Adversarial Sobolev Alignment for Faithful Image Super Resolution

TPMM-DPO: Trajectory-aware Preference-guided Model Merging for Iterative Direct Preference Optimization

TUR-DPO: Topology- and Uncertainty-Aware Direct Preference Optimization

Together Fine-Tuning Platform, Now With Preference Optimization and Continued Training