Apple 的研究论文探讨了条件扩散模型中组合泛化机制,特别关注这些模型如何处理生成比训练时更多的对象的图像。研究确定“局部条件分数”是实现此能力的关键因素,表明成功进行长度泛化的模型表现出这些分数,而失败的模型则没有。研究还提出了一种强制执行这些局部分数的方法,该方法成功地使先前表现不佳的模型实现了长度泛化。
AI
Conditional diffusion models appear capable of compositional generalization, i.e., generating convincing samples for out-of-distribution combinations of conditioners, but the mechanisms underlying this ability remain unclear. To make this concrete, we study length generalization,…
arXiv:2606.12977v1 Announce Type: cross Abstract: Model fingerprinting, embedding user-specific identifiers (fingerprints) into generated outputs, has recently emerged as a popular solution to protect the intellectual property rights (IPR) of generative text-to-image (T2I) models…
arXiv:2606.13240v1 Announce Type: cross Abstract: A key strength of diffusion models lies in their flexibility, since their outputs can be controlled at sampling time through guidance. However, beyond simple cases such as conditional sampling, the target distribution is often lef…
Diffusion models have seen wide adoption for 3D molecular generation, yet they offer no principled signal of when a generated molecule is likely to be of low quality. We propose a post-hoc method for estimating per-sample uncertainty in pretrained molecular diffusion models. Buil…
Diffusion models have seen wide adoption for 3D molecular generation, yet they offer no principled signal of when a generated molecule is likely to be of low quality. We propose a post-hoc method for estimating per-sample uncertainty in pretrained molecular diffusion models. Buil…
Diffusion models have emerged as state-of-the-art generative models for high-fidelity image synthesis, particularly in their classifier-free guided and classifier-guided forms. However, standard classifier guidance concentrates probability mass around high-density class mean, lea…
Recent diffusion editors perform diverse instruction-based edits while conditioning on the source image at every denoising step. Yet persistent source-image conditioning can limit how fully an edit is executed and how natural the result appears, especially when the target scene d…
arXiv:2606.11552v1 Announce Type: new Abstract: Large language models (LLMs) achieve remarkable performance across a wide range of tasks, but their autoregressive decoding process incurs substantial inference costs due to inherently sequential token generation. Speculative decodi…
arXiv cs.LG
TIER_1English(EN)·Zhongxin Yang, Yuanwei Bin, Xiang I. A. Yang, Shiyi Chen·
arXiv:2606.11277v1 Announce Type: new Abstract: Reliable extrapolation remains a central challenge for generative models in computational physics, because models trained over finite ranges of time, parameters, or geometries may produce physically inconsistent predictions outside …
arXiv cs.LG
TIER_1English(EN)·Yi Hu, Leying Yi, Emily Davis, Finn Carter·
arXiv:2606.09901v1 Announce Type: cross Abstract: Diffusion-based generative models enable powerful image editing capabilities, but achieving precise control while maintaining fidelity and safety remains challenging. We present a comprehensive theoretical and empirical study of c…
arXiv:2310.05264v5 Announce Type: replace Abstract: In this work, we investigate an intriguing and prevalent phenomenon of diffusion models which we term as "consistent model reproducibility": given the same starting noise input and a deterministic sampler, different diffusion mo…
arXiv cs.AI
TIER_1English(EN)·Matina Mahdizadeh Sani, Nima Jamali, Mohammad Jalali, Farzan Farnia·
arXiv:2601.08379v2 Announce Type: replace-cross Abstract: Pre-trained diffusion models have emerged as powerful generative priors for both unconditional and conditional sample generation, yet their outputs often deviate from the characteristics of user-specific target data. Such …
arXiv:2409.02426v5 Announce Type: replace Abstract: Despite their empirical success across a wide range of generative tasks, the fundamental principles underlying the ability of diffusion models to learn data distributions are poorly understood. In this work, we develop a new mat…
Large language models (LLMs) achieve remarkable performance across a wide range of tasks, but their autoregressive decoding process incurs substantial inference costs due to inherently sequential token generation. Speculative decoding addresses this bottleneck by employing a ligh…
Large language models (LLMs) achieve remarkable performance across a wide range of tasks, but their autoregressive decoding process incurs substantial inference costs due to inherently sequential token generation. Speculative decoding addresses this bottleneck by employing a ligh…
arXiv cs.LG
TIER_1English(EN)·Shigui Li, Delu Zeng·
arXiv:2606.07835v1 Announce Type: new Abstract: A fundamental tension exists in the large-step inference of diffusion models via their deterministic probability flow ordinary differential equation (PF-ODE) trajectories, which we identify as the contractivity trap: efficient infer…
arXiv cs.AI
TIER_1English(EN)·Danqi Zhuang, Jisui Huang, Xiaoyue Xi, Andrew Kiggins, Xiaojie Wang, Ke Chen, Yue Wu·
arXiv:2606.09816v1 Announce Type: cross Abstract: Standard diffusion models typically use a single time-homogeneous Gaussian terminal distribution as the reference law for generation. While this choice is analytically convenient and empirically powerful, it provides little explic…
arXiv:2606.09718v1 Announce Type: new Abstract: Diffusion models have demonstrated remarkable generative capabilities and have also emerged as powerful self-supervised representation learners, yet the connection between these two abilities remains less explored. Drawing inspirati…
arXiv cs.LG
TIER_1English(EN)·Meher Chaitanya, Sebastian Dalleiger, Luana Ruiz·
arXiv:2606.09340v1 Announce Type: new Abstract: Local Hyper-Flow Diffusion (HFD) gives an edge-size-independent Cheeger-type guarantee for seeded clustering in general submodular hypergraphs, but existing HFD solvers do not keep intermediate computation local at every iteration. …
arXiv cs.LG
TIER_1English(EN)·Emma Finn, Binxu Wang, T. Anderson Keller, Demba E. Ba·
arXiv:2606.08309v1 Announce Type: new Abstract: Score-based generative models have had remarkable success over the last decade in generating a diverse set of visually plausible images. A variety of architectures including CNNs, U-Nets, and Transformers have been used as the score…
arXiv:2512.15116v2 Announce Type: replace-cross Abstract: Multivariate time series imputation is fundamental in applications such as healthcare, traffic forecasting, and biological modeling, where sensor failures and irregular sampling lead to pervasive missing values. However, e…
Standard diffusion models typically use a single time-homogeneous Gaussian terminal distribution as the reference law for generation. While this choice is analytically convenient and empirically powerful, it provides little explicit structure for data concentrated near low-dimens…
Standard diffusion models typically use a single time-homogeneous Gaussian terminal distribution as the reference law for generation. While this choice is analytically convenient and empirically powerful, it provides little explicit structure for data concentrated near low-dimens…
Diffusion models have demonstrated remarkable generative capabilities and have also emerged as powerful self-supervised representation learners, yet the connection between these two abilities remains less explored. Drawing inspiration from self-supervised learning (SSL), we intro…
Local Hyper-Flow Diffusion (HFD) gives an edge-size-independent Cheeger-type guarantee for seeded clustering in general submodular hypergraphs, but existing HFD solvers do not keep intermediate computation local at every iteration. We introduce Thresholded Local HFD (TL-HFD), a f…
arXiv:2512.20963v3 Announce Type: replace Abstract: Diffusion models excel at generating high-quality, diverse samples, yet they risk memorizing training data when overfit to the training objective. We analyze the distinctions between memorization and generalization in diffusion …
arXiv cs.AI
TIER_1English(EN)·Renzo G. Soatto, Anders Hoel, Greycen Ren, Shorna Alam, Stephen Bates, Nikolaos P. Daskalakis, Caroline Uhler, Maria Skoularidou·
arXiv:2604.03779v2 Announce Type: replace-cross Abstract: Diffusion models have excelled at generative tasks for both continuous and token-based domains, but their application to discrete ordinal data remains underdeveloped. We present CountsDiff, a diffusion framework designed t…
arXiv:2601.04791v4 Announce Type: replace-cross Abstract: While latent diffusion models (LDMs) have emerged as powerful priors for inverse problems, existing LDM-based solvers frequently suffer from instability. In this work, we first identify the instability as a discrepancy bet…
Token-subset representation alignment method called MaskAlign improves diffusion transformer training by reducing reliance on complete token sets and maintaining stable alignment behavior under perturbations.
Score-based generative models have had remarkable success over the last decade in generating a diverse set of visually plausible images. A variety of architectures including CNNs, U-Nets, and Transformers have been used as the score-approximation network in such diffusion modelin…
arXiv:2602.07875v3 Announce Type: replace Abstract: Generating tabular data under conditions is critical to applications requiring precise control over the generative process. Existing methods rely on training-time strategies that do not generalise to unseen constraints during in…
arXiv:2606.05328v1 Announce Type: cross Abstract: Modern video diffusion models generate increasingly realistic and temporally coherent videos, motivating their use as candidate world simulators. Yet it remains unclear whether these models internally encode physical structure, or…
arXiv:2606.06303v1 Announce Type: new Abstract: Controllable generation with discrete diffusion models is often hindered by high computational overhead or the need for retraining. In this paper, we present \underline{\textbf{G}}radient-\underline{\textbf{I}}nformed \underline{\te…
arXiv:2606.06007v1 Announce Type: new Abstract: Generating realistic synthetic sequential data is critical in real-world applications across operations research, finance, healthcare, energy systems, and scientific computing, where time-indexed observations are used for prediction…
Controllable generation with discrete diffusion models is often hindered by high computational overhead or the need for retraining. In this paper, we present \underline{\textbf{G}}radient-\underline{\textbf{I}}nformed \underline{\textbf{L}}ogit \underline{\textbf{C}}orrection (\t…
Generating realistic synthetic sequential data is critical in real-world applications across operations research, finance, healthcare, energy systems, and scientific computing, where time-indexed observations are used for prediction, simulation, risk assessment, and data-driven d…
arXiv:2606.04964v1 Announce Type: new Abstract: Diffusion language models (DLMs) generate text through iterative denoising, and blockwise decoding improves their practicality by committing tokens in local blocks. However, existing blockwise methods typically rely on fixed block s…
arXiv cs.AI
TIER_1English(EN)·Omer Dahary, Benaya Koren, Daniel Garibi, Daniel Cohen-Or·
arXiv:2603.28762v2 Announce Type: replace-cross Abstract: Modern Text-to-Image (T2I) diffusion models have achieved remarkable semantic alignment, yet they often suffer from a significant lack of variety, converging on a narrow set of visual solutions for any given prompt. This t…
arXiv:2602.14885v2 Announce Type: replace-cross Abstract: Recurrent neural networks (RNNs) provide a theoretical framework for understanding computation in biological neural circuits, yet classical results, such as Hopfield's model of associative memory, rely on symmetric connect…
arXiv cs.LG
TIER_1English(EN)·Haojun Qiu, Kiriakos N. Kutulakos, David B. Lindell·
arXiv:2606.04299v1 Announce Type: cross Abstract: We consider the problem of generating images whose internal structure -- defined by the distribution of patches across multiple scales -- matches that of a single reference image. Recent approaches address this problem by training…
arXiv:2606.05073v1 Announce Type: new Abstract: Missing value imputation is a fundamental task in machine learning, with most existing methods assuming that all missing entries correspond to unobserved regular values. In many real-world datasets, however, missingness may arise fr…
Complexity-Balanced Splitting (CBS) allocates generative capacity across specialized sub-networks by partitioning the diffusion timeline based on local complexity measures, improving synthesis quality without increasing inference cost.
Missing value imputation is a fundamental task in machine learning, with most existing methods assuming that all missing entries correspond to unobserved regular values. In many real-world datasets, however, missingness may arise from two distinct sources: some entries are meanin…
Missing value imputation is a fundamental task in machine learning, with most existing methods assuming that all missing entries correspond to unobserved regular values. In many real-world datasets, however, missingness may arise from two distinct sources: some entries are meanin…
Diffusion language models (DLMs) generate text through iterative denoising, and blockwise decoding improves their practicality by committing tokens in local blocks. However, existing blockwise methods typically rely on fixed block sizes or delimiter-based runtime signals, which d…
arXiv:2606.03212v1 Announce Type: new Abstract: Low-rank tensor decomposition (TD) is usually effective on clean, fully observed data, but it often degrades under severe missingness or noise. Low-rankness is itself a useful but limited structural prior, and additional handcrafted…
arXiv:2606.02955v1 Announce Type: cross Abstract: Diffusion large language models promise parallel token generation, yet inference remains bottlenecked by deciding which masked tokens can be safely committed together. Fast-dLLM addressed this with KV caching and confidence-guided…
arXiv:2606.02607v1 Announce Type: cross Abstract: Tabular synthesis is critical for privacy-preserving sharing and augmentation, yet diffusion models rely on implicit mechanisms to capture inter-column relationships. We introduce Geometry-Aware Tabular Diffusion (GATD), which aug…
arXiv:2606.03393v1 Announce Type: new Abstract: We propose a novel diffusion model, Flicker-DDPM, which incorporates flicker (1/f) noise inspired by self-organized criticality (SOC), a widely observed phenomenon in natural systems. Unlike denoising diffusion probabilistic models …
arXiv:2606.03827v1 Announce Type: cross Abstract: In-silico trials of medical devices require the generation of virtual populations of anatomies. In cardiovascular applications, virtual anatomy is typically represented as a 3D+t mesh sampled from a generative model. However, most…
arXiv:2509.19305v2 Announce Type: replace-cross Abstract: Diffusion probability models have shown significant promise in offline reinforcement learning by directly modeling trajectory sequences. However, existing approaches primarily focus on time-domain features while overlookin…
arXiv:2606.03936v1 Announce Type: new Abstract: Neural operator surrogates (NO) approximate PDE solutions orders of magnitude faster than numerical solvers, but suffer from spectral bias: high-frequency content is systematically attenuated, limiting reliability where fine-scale s…
arXiv cs.AI
TIER_1English(EN)·Davide Gallon, Philippe von Wurstemberger, Patrick Cheridito, Arnulf Jentzen·
One of the primary challenges in Bayesian inference on the parameters of a diffusion model from discrete observations is the unavailability of an analytical expression for the transition density function between consecutive observation times, which is needed to derive the likelih…
Neural operator surrogates (NO) approximate PDE solutions orders of magnitude faster than numerical solvers, but suffer from spectral bias: high-frequency content is systematically attenuated, limiting reliability where fine-scale structure matters. Sparse sensor measurements of …
Neural operator surrogates (NO) approximate PDE solutions orders of magnitude faster than numerical solvers, but suffer from spectral bias: high-frequency content is systematically attenuated, limiting reliability where fine-scale structure matters. Sparse sensor measurements of …
arXiv cs.AI
TIER_1English(EN)·Alejandro F. Frangi·
In-silico trials of medical devices require the generation of virtual populations of anatomies. In cardiovascular applications, virtual anatomy is typically represented as a 3D+t mesh sampled from a generative model. However, most existing mesh generators focus on static anatomy,…
arXiv cs.AI
TIER_1English(EN)·Longxuan Yu, Yunshu Wu, Yu Fu, Siheng Xiong, Rob Brekelmans, Hui Liu, Yue Dong, Greg Ver Steeg·
arXiv:2606.01024v1 Announce Type: cross Abstract: Discrete Masked diffusion language models generate text by iterative parallel decoding, but few-step decoding suffers from a tradeoff between length and quality: with a fixed step budget, standard methods can generate a short, hig…
arXiv:2601.22651v2 Announce Type: replace-cross Abstract: Training-data attribution for vision generative models aims to identify which training data influenced a given output. While most methods score individual examples, practitioners often need group-level answers (e.g., artis…
arXiv cs.AI
TIER_1English(EN)·Abdullah Al Shafi, Kazi Saeed Alam, Sk Imran Hossain, Engelbert Mephu Nguifo·
arXiv:2606.00798v1 Announce Type: cross Abstract: Parameter compression of class-conditional diffusion models reveals an underexplored limitation in output-level distillation: the unconditional score branch remains unsupervised, leaving the classifier-free guidance gap underdeter…
arXiv:2606.00156v1 Announce Type: cross Abstract: Understanding the human brain requires access to its microscopic tissue architecture. Diffusion magnetic resonance imaging (MRI) provides the only noninvasive window into whole-brain microstructure in vivo, yet reliable quantitati…
arXiv:2606.00094v1 Announce Type: cross Abstract: Image generative models aim to sample data points from the underlying data manifold, a task that requires learning and decoding a dense, low-dimensional, and compact parameterization space. To achieve this, we propose the Data Man…
arXiv cs.AI
TIER_1English(EN)·Renhao Zhang, Haotian Fu, Mingxi Jia, George Konidaris, Yilun Du, Bruno Castro da Silva·
arXiv:2606.00336v1 Announce Type: new Abstract: We propose Parameterized Diffusion Policy (PDP), a framework for learning diffusion policies conditioned on low-dimensional, continuous parameters embedded in a learned behavior manifold. By constructing this manifold so that distan…
arXiv cs.LG
TIER_1English(EN)·Daniela Breitman, Andrei Mesinger, Steven G. Murray, Ivan Nikolic, Roberto Trotta·
arXiv:2606.00219v1 Announce Type: cross Abstract: We are witnessing a surge in observations of the cosmic dawn (CD) and epoch of reionisation (EoR), driving an increasing demand for fast and robust theoretical interpretation frameworks. In response, machine learning (ML), and emu…
arXiv:2606.00295v1 Announce Type: new Abstract: Masked diffusion models have seen great success in capturing data distributions over discrete sequences in domains such as text and proteins. These models generate data by iteratively unmasking tokens starting from a fully masked se…
arXiv:2606.00366v1 Announce Type: new Abstract: We consider the problem of generating a large collection of initial guesses for local minima of multimodal non-convex continuous optimization problems. The goal is for these initial guesses to be high-quality (i.e., a numerical solv…
arXiv cs.LG
TIER_1English(EN)·Simon De Reuver, Tamas Kristof Toth, Teddy Lazebnik·
arXiv:2606.00988v1 Announce Type: new Abstract: Symbolic regression (SR) offers a route to scientific discovery by converting observations into interpretable governing equations. However, despite its promise, its reliability degrades sharply when spatiotemporal measurements are s…
arXiv cs.LG
TIER_1English(EN)·Guanyu Zhou, Yao Liu, Yanglei Gan, Yuxiang Cai, Peng He, Run Lin, Yuxiang Liu, Qiao Liu·
arXiv:2606.01273v1 Announce Type: new Abstract: Spatio-temporal point processes (STPPs) provide a principled framework for modeling asynchronous events in continuous time and space. Recent diffusion-based approaches offer a flexible alternative to deterministic prediction by mode…
arXiv cs.LG
TIER_1English(EN)·Hamza Cherkaoui, H\'el\`ene Halconruy, Antonio Ocello·
arXiv:2605.13175v2 Announce Type: replace Abstract: Recent works have proposed incorporating heavy-tailed (HT) noise into diffusion- and flow-based generative models, with the goals of better recovering the tails of target distributions and improving generative diversity. This mo…
arXiv:2602.08689v2 Announce Type: replace Abstract: Diffusion models generate samples through an iterative denoising process guided by a pretrained neural network. Once the denoiser is fixed, the sampling algorithm itself (noise schedules, guidance scales, stochasticity profiles)…
arXiv:2606.02331v1 Announce Type: cross Abstract: Diffusion-based inverse problem solvers can produce realistic reconstructions, but realism alone does not ensure that the recovered details are supported by the measurement. We study this failure as measurement-conditioned halluci…
arXiv:2602.11453v2 Announce Type: replace-cross Abstract: In information retrieval (IR), learning-to-rank (LTR) methods have traditionally limited themselves to discriminative machine learning approaches that model the probability of the document being relevant to the query given…
arXiv cs.AI
TIER_1English(EN)·Yeongmin Kim, Donghyeok Shin, Byeonghu Na, Minsang Park, Richard Lee Kim, Il-Chul Moon·
arXiv:2602.03211v2 Announce Type: replace-cross Abstract: Diffusion models have demonstrated strong generative performance; however, generated samples often fail to fully align with human intent. This paper studies an efficient test-time scaling method for sampling from regions w…
arXiv cs.AI
TIER_1English(EN)·Ziseok Lee, Minyeong Hwang, Wooyeol Lee, Sanghyun Jo, Jihyung Ko, Young Bin Park, Jae-Mun Choi, Eunho Yang, Kyungsu Kim·
arXiv:2512.10339v2 Announce Type: replace Abstract: Inference-time steering adapts pretrained diffusion and flow models to new tasks without retraining, often utilizing ratio-of-densities constructions that reweight time-indexed marginals with fixed exponents. We identify Margina…
Diffusion-based inverse problem solvers can produce realistic reconstructions, but realism alone does not ensure that the recovered details are supported by the measurement. We study this failure as measurement-conditioned hallucination: visually meaningful content that is either…
Parameter estimation in stochastic differential equations is a classical statistical problem of much importance in many scientific fields. Recent work of Tapia Costa et al. (2026) introduced a novel technique for estimating the drift when the diffusion parameter is known, using d…
arXiv cs.LG
TIER_1English(EN)·Alireza Kheirandish, Jihoon Hong, Sara Fridovich-Keil·
arXiv:2605.31596v1 Announce Type: cross Abstract: Diffusion models have shown promising performance as data-driven priors for computational imaging, as well as some capacity to detect out-of-distribution (OOD) images. However, existing approaches to OOD detection often require so…
arXiv cs.AI
TIER_1English(EN)·Enrico Cassano, Riccardo Renzulli, Marco Nurisso, Mirko Zaffaroni, Alan Perotti, Marco Grangetto·
arXiv:2509.21379v3 Announce Type: replace-cross Abstract: Concept unlearning in diffusion models is hampered by feature splitting, where concepts are distributed across many latent features, making their removal challenging and computationally expensive. We introduce SAEmnesia, a…
arXiv:2605.30753v1 Announce Type: new Abstract: Diffusion-based large language models (dLLMs) support parallel text generation via iterative denoising, yet inference remains latency-heavy because many steps are spent on redundant refinement and repeated remasking of tokens whose …
arXiv cs.AI
TIER_1English(EN)·Jinwoo Kim, S\'ekou-Oumar Kaba, Jiyun Park, Seunghoon Hong, Siamak Ravanbakhsh·
arXiv:2602.08267v2 Announce Type: replace-cross Abstract: We study the problem of transformation inversion on general Lie groups: a datum is transformed by an unknown group element, and the goal is to recover an inverse transformation that maps it back to the original data distri…
arXiv cs.LG
TIER_1English(EN)·Victor M. Yeom-Song, Severi Rissanen, Arno Solin, Samuel Kaski, Mingfei Sun·
arXiv:2512.14980v4 Announce Type: replace Abstract: Diffusion models have become a powerful generative prior for solutions of partial differential equations (PDEs). Existing approaches enforce physical constraints either by adding the PDE residuals as loss regularizers or through…
Decoupled Residual Denoising Diffusion models (DRDD) improve unified image-to-image translation by separating noise diffusion for domain harmonization from residual diffusion for semantic mapping, enhancing data efficiency and performance.
Diffusion models have shown promising performance as data-driven priors for computational imaging, as well as some capacity to detect out-of-distribution (OOD) images. However, existing approaches to OOD detection often require some knowledge of the shifted distribution, fail to …
arXiv:2605.29809v1 Announce Type: cross Abstract: Large-scale text-to-image (T2I) diffusion models have enabled unprecedented creative applications, but their unauthorized use has raised serious intellectual property concerns, making model ownership verification (MOV) increasingl…
arXiv:2605.16415v2 Announce Type: replace-cross Abstract: The creativity of diffusion models refers to their ability to generate highly realistic images that are different from their training data. Creativity is somewhat surprising since it is known that if the denoiser used in t…
arXiv cs.LG
TIER_1English(EN)·Benjamin A. Burns, Sara Fridovich-Keil·
arXiv:2605.30330v1 Announce Type: new Abstract: Diffusion models have excellent capacity to model complex distributions of natural data, which has made them a popular and effective choice for posterior sampling in imaging inverse problems. Existing methods can incorporate any mea…
arXiv:2605.29932v1 Announce Type: new Abstract: Forecasting the progression of neurodegenerative diseases, such as Parkinson's disease, is essential for effective long-term planning and personalized therapeutic intervention. Existing systems typically produce scalar clinical scor…
arXiv cs.LG
TIER_1English(EN)·Gabriel Moreira, Manuel Marques, Jo\~ao Paulo Costeira, Chenyan Xiong·
arXiv:2605.28900v1 Announce Type: new Abstract: We introduce Spectral Guidance, a framework for controlling diffusion models by leveraging the intrinsic geometry of the generative process. As data is progressively corrupted by noise, only a small number of features remain informa…
arXiv:2511.19316v2 Announce Type: replace-cross Abstract: Recent fine-tuning techniques for diffusion models enable them to reproduce specific image sets, such as particular faces or artistic styles, but also introduce copyright and security risks. Dataset watermarking has been p…
arXiv cs.AI
TIER_1English(EN)·Antoni Kowalczuk, Dominik Hintersdorf, Lukas Struppek, Kristian Kersting, Adam Dziedzic, Franziska Boenisch·
arXiv:2507.16880v3 Announce Type: replace-cross Abstract: Text-to-image diffusion models (DMs) have achieved remarkable success in image generation. However, concerns about data privacy and intellectual property remain due to their potential to inadvertently memorize and replicat…
arXiv:2605.29716v1 Announce Type: new Abstract: Diffusion Large Language Models (dLLMs) have emerged as a promising non-autoregressive generative paradigm. Given the prohibitive computational cost of full fine-tuning, Parameter-Efficient Fine-Tuning (PEFT) has become the standard…
arXiv cs.AI
TIER_1English(EN)·Dueun Kim, Albert No·
arXiv:2605.29123v1 Announce Type: new Abstract: Masked diffusion language models (MDMs) uniquely support any-order generation, with confidence-based decoding currently serving as the de facto standard inference policy. To optimize for this, recent training schemes attempt to alig…
arXiv:2605.28902v1 Announce Type: new Abstract: Concept erasure has emerged as a promising approach to mitigate undesired or unsafe content in diffusion models, yet existing methods still face significant limitations. While training-based methods are effective, their high computa…
Diffusion models have excellent capacity to model complex distributions of natural data, which has made them a popular and effective choice for posterior sampling in imaging inverse problems. Existing methods can incorporate any measurement model at inference time but must use an…
arXiv cs.AI
TIER_1English(EN)·Seunghyeok Shin, Minwoo Kim, Dabin Kim, Hongki Lim·
arXiv:2605.27990v1 Announce Type: cross Abstract: Diffusion posterior sampling conditions diffusion priors on measurements, but data-consistency updates are typically scaled by hand-tuned guidance weights and can destabilize sampling under stiff, operator-dependent curvature. We …
arXiv cs.LG
TIER_1English(EN)·Abduragim Shtanchaev, Albina Ilina, Yazid Janati, Arip Asadulaev, Martin Takac, Eric Moulines·
arXiv:2603.07860v2 Announce Type: replace Abstract: Pretrained diffusion models are effective priors for Bayesian inverse problems, but posterior sampling with these priors is often costly because data-consistency guidance is applied throughout the full reverse trajectory. Existi…
arXiv cs.AI
TIER_1English(EN)·Chieh-Hsin Lai, Yang Song, Dongjun Kim, Yuki Mitsufuji, Stefano Ermon·
arXiv:2510.21890v2 Announce Type: replace-cross Abstract: This book presents the core principles that have guided the development of diffusion models, tracing their origins and showing how diverse formulations arise from shared mathematical ideas. Diffusion modeling starts by def…
arXiv:2605.27476v1 Announce Type: cross Abstract: We characterize the pre-softmax attention matrix $\mathbf{QK^\top}$ in transformers as an associative memory matrix encoding pairwise associations between input features. By decomposing this matrix into its symmetric and skew-symm…
arXiv:2605.27813v1 Announce Type: cross Abstract: Text-to-image diffusion models generate images through an iterative denoising process, so internal neural layers produce trajectories of activations rather than single static representations. Sparse autoencoders (SAEs) have recent…
arXiv cs.LG
TIER_1English(EN)·Andrew Millard, Fredrik Lindsten, Zheng Zhao·
arXiv:2601.23262v2 Announce Type: replace Abstract: We introduce a guided stochastic sampling method that augments sampling from diffusion models with physics-based guidance derived from partial differential equation (PDE) residuals and observational constraints, ensuring generat…
arXiv cs.LG
TIER_1English(EN)·Nithesh Chandher Karthikeyan, Jonas Unger, Gabriel Eilertsen·
arXiv:2605.27495v1 Announce Type: cross Abstract: Data availability remains a critical bottleneck in many deep learning applications. Large-scale datasets are often expensive to collect, curate and annotate, which can limit the scalability and applicability of supervised learning…
arXiv:2605.27736v1 Announce Type: new Abstract: Online reinforcement learning is becoming increasingly important for aligning diffusion models with non-differentiable objectives. However, existing methods still face limitations in assigning fine-grained credit along denoising tra…
arXiv:2602.20497v3 Announce Type: replace-cross Abstract: Diffusion models have achieved remarkable success in image and video generation tasks. However, the high computational demands of Diffusion Transformers (DiTs) pose a significant challenge to their practical deployment. Wh…
arXiv:2602.18647v2 Announce Type: replace-cross Abstract: We introduce InfoNoise, an online adaptive noise schedule for diffusion training that reallocates optimization effort toward noise levels where denoising is most informative. Together with loss weighting, a noise schedule …
Diffusion models exhibit spectral bias in image synthesis, and a new sampling method called Colored Noise Sampling addresses this by dynamically allocating energy based on frequency-dependent schedules, leading to improved image quality metrics.
arXiv:2605.26582v1 Announce Type: cross Abstract: Discrete diffusion models achieve strong performance in text and image generation, but their inference remains slow and must inherently balance sampling efficiency and sample quality. In this work, we present a systematic study of…
arXiv:2510.03352v3 Announce Type: replace-cross Abstract: Diffusion models have been used as priors for solving inverse problems. However, existing approaches typically overlook side information that could significantly improve reconstruction quality, especially in severely ill-p…
arXiv cs.LG
TIER_1English(EN)·Xing Cong, Hanlin Tang, Kan Liu, Lan Tao, Lin Qu, Chenhao Xie·
arXiv:2605.26632v1 Announce Type: new Abstract: Diffusion Transformers (DiT) achieve strong performance in image generation but incur substantial inference costs. While prior work has reduced this cost via quantization and distillation, semi-structured sparsity, which can nearly …
arXiv cs.LG
TIER_1English(EN)·Loukas Sfountouris, Giannis Daras, Paris Giampouras·
arXiv:2511.16870v3 Announce Type: replace-cross Abstract: Enforcing alignment between the internal representations of diffusion or flow-based generative models and those of pretrained self-supervised encoders has recently been shown to provide a powerful inductive bias, improving…
arXiv cs.LG
TIER_1English(EN)·Nicola Novello, Federico Fontana, Luigi Cinque, Deniz Gunduz, Andrea M. Tonello·
arXiv:2509.21167v2 Announce Type: replace Abstract: Most existing methods for concept unlearning in text-to-image diffusion models minimize a mean squared error (MSE) loss between the denoiser outputs conditioned on a target and an anchor concept, which is implicitly the KL diver…
arXiv cs.LG
TIER_1English(EN)·Nithesh Chandher Karthikeyan, Jonas Unger, Gabriel Eilertsen·
arXiv:2605.27343v1 Announce Type: cross Abstract: Diffusion models have emerged as powerful tools for high-quality image generation and editing, but guiding these models to produce specific outputs remains a challenge. Conventional approaches rely on conditioning mechanisms, such…
arXiv:2605.27102v1 Announce Type: cross Abstract: Flow matching with clean-data prediction has shown that regressing the clean point can exploit low-dimensional structure more effectively than predicting an ambient noised quantity. We ask whether this principle remains useful aft…
arXiv cs.LG
TIER_1English(EN)·Gwangho Kim, Sungyoon Lee·
arXiv:2605.26756v1 Announce Type: new Abstract: Diffusion models can unintentionally memorize training samples, raising concerns about privacy and copyright. While recent methods can detect memorization, they often rely on global or model-specific signals and provide limited insi…
arXiv cs.AI
TIER_1English(EN)·Junseo Bang, Joonhee Lee, Kyeonghyun Lee, Haechang Lee, Dong Un Kang, Se Young Chun·
arXiv:2506.07813v2 Announce Type: replace-cross Abstract: Arbitrary-scale image super-resolution aims to upsample images to any desired resolution, offering greater flexibility than traditional fixed-scale super-resolution. Recent approaches based on regression-based or generativ…
Diffusion models have emerged as powerful tools for high-quality image generation and editing, but guiding these models to produce specific outputs remains a challenge. Conventional approaches rely on conditioning mechanisms, such as text prompts or semantic maps, which require e…
Flow matching with clean-data prediction has shown that regressing the clean point can exploit low-dimensional structure more effectively than predicting an ambient noised quantity. We ask whether this principle remains useful after images are mapped into a learned latent space, …
Discrete diffusion models achieve strong performance in text and image generation, but their inference remains slow and must inherently balance sampling efficiency and sample quality. In this work, we present a systematic study of how the \emph{degree of stochasticity} in Markov …
arXiv cs.AI
TIER_1English(EN)·Matthew Niedoba, Berend Zwartsenberg, Frank Wood·
arXiv:2605.24192v1 Announce Type: cross Abstract: The neural-network denoising functions which form the backbone of image diffusion models are remarkably consistent in their generalization behaviour across a wide variety of network architectures and training procedure hyperparame…
arXiv cs.AI
TIER_1English(EN)·Sol Park, Soobin Um·
arXiv:2605.24631v1 Announce Type: cross Abstract: Minority sampling aims to generate low-density instances on a data manifold and is of central importance in applications such as medical diagnosis, anomaly detection, and creative AI. Existing approaches, however, define minority …
arXiv cs.AI
TIER_1English(EN)·Weixin Wang, Yu Yang, Wei Deng, Pan Xu·
arXiv:2605.25123v1 Announce Type: cross Abstract: We study inference-time alignment for diffusion-based generative models, aiming to steer a base model toward high-reward outputs without updating its weights. Recent Sequential Monte Carlo (SMC)-based steering methods approximate …
arXiv:2605.25681v1 Announce Type: cross Abstract: Designing a single molecule that modulates two targets is a promising strategy for polypharmacology, but it remains substantially harder than standard single-target generation because one candidate must satisfy two binding require…
arXiv:2602.05961v2 Announce Type: replace Abstract: Sampling from a distribution $p(x) \propto e^{-\mathcal{E}(x)}$ known up to a normalising constant is an important and challenging problem in statistics. Recent years have seen the rise of a new family of amortised sampling algo…
arXiv cs.LG
TIER_1English(EN)·Sungwon Park, Anthony Zhou, Hongjoong Kim, Amir Barati Farimani·
arXiv:2602.04139v2 Announce Type: replace Abstract: Neural operators provide a powerful framework for learning discretization invariant mappings between function spaces, but standard deterministic models do not capture predictive uncertainty. We introduce diffusion last layer (DL…
arXiv:2510.01384v4 Announce Type: replace Abstract: A natural desideratum for generative models is self-correction--detecting and revising low-quality tokens at inference. While Masked Diffusion Models (MDMs) have emerged as a promising approach for generative modeling in discret…
arXiv:2510.01184v2 Announce Type: replace Abstract: We present a mechanism to steer the sampling diversity of denoising diffusion and flow matching models, allowing users to sample from a sharper or broader distribution than the training distribution. We build on the observation …
arXiv:2605.25111v1 Announce Type: new Abstract: Pre-propagation graph neural networks (PPGNNs) decouple node feature propagation from transformation: graph diffusion is performed once as preprocessing, and training reduces to dense per-node transformations. This design enables mi…
arXiv cs.CL
TIER_1English(EN)·Ke Lin, Yiyang Luo, Zhaolong Su, Yunya Song, Anyi Rao·
arXiv:2605.25969v1 Announce Type: new Abstract: Causal Transformer language models suffer from strictly sequential decoding and a quadratic per-step attention cost. While linear-time causal models and discrete diffusion models each address these weaknesses, their integration rema…
arXiv:2605.25210v1 Announce Type: cross Abstract: Diffusion models are increasingly used as powerful conditional generators, yet real deployments often involve multiple target distributions arising from different tasks, e.g., diverse prompt domains in text-to-image generation, or…
arXiv:2605.19021v2 Announce Type: replace Abstract: Deep Graph Neural Networks (GNNs) are essential for capturing complex dependencies in graph-structured data. However, scaling GNNs to depth remains challenging, as stacking layers leads to representation collapse and diminishing…
arXiv cs.AI
TIER_1English(EN)·Zixin Jessie Chen, Zhuo Chen, Archer Wang, Jeff Gore, William T. Freeman, Congyue Deng, Marin Solja\v{c}i\'c·
arXiv:2605.26032v1 Announce Type: cross Abstract: Creating images from noise is image generation; reconstructing fine details from coarse inputs is super-resolution. Despite their practical differences, both can be understood as reversing information loss across scales. We introd…
arXiv cs.AI
TIER_1English(EN)·Shaorong Zhang, Rob Brekelmans, Greg Ver Steeg·
arXiv:2510.07343v3 Announce Type: replace-cross Abstract: Diffusion Posterior Sampling (DPS) provides a principled Bayesian approach to inverse problems by sampling from $p(x_0 \mid y)$. While posterior sampling is valuable for capturing uncertainty and multi-modality, many class…
Generative posterior sampling using diffusion models has emerged as a dominant paradigm for solving inverse problems in imaging, which usually consists of three main components: data consistency (DC) guidance, classifier-free guidance (CFG) and stochasticity. While prior arts hav…
Latent diffusion models using clean-data prediction outperform velocity prediction in compressed representations, demonstrating that prediction targets are geometrically dependent rather than algebraically interchangeable.
Diffusion Transformers achieve strong image generation performance but face high inference costs; this work proposes RT-Lynx, which uses activation sparsification and optimized CUDA kernels to accelerate inference while maintaining generation quality.
The symmetric and skew-symmetric components of transformer attention matrices are analyzed as governing energy landscape structure and circulation dynamics, respectively, with implications for generation trade-offs.
Creating images from noise is image generation; reconstructing fine details from coarse inputs is super-resolution. Despite their practical differences, both can be understood as reversing information loss across scales. We introduce $\textbf{SKILD}$, a $\textbf{S}$cale-invariant…
Creating images from noise is image generation; reconstructing fine details from coarse inputs is super-resolution. Despite their practical differences, both can be understood as reversing information loss across scales. We introduce $\textbf{SKILD}$, a $\textbf{S}$cale-invariant…
Causal Transformer language models suffer from strictly sequential decoding and a quadratic per-step attention cost. While linear-time causal models and discrete diffusion models each address these weaknesses, their integration remains inherently inconsistent: diffusion requires …
Designing a single molecule that modulates two targets is a promising strategy for polypharmacology, but it remains substantially harder than standard single-target generation because one candidate must satisfy two binding requirements while preserving drug-likeness and synthesiz…
arXiv cs.LG
TIER_1English(EN)·Egor Lifar, Semyon Savkin, Timur Garipov, Shangyuan Tong, Tommi Jaakkola·
arXiv:2605.23275v1 Announce Type: new Abstract: In this paper, we propose Diffusion Domain Expansion (DDE), a method that efficiently extends pre-trained diffusion models to generate larger objects and handle more complex conditioning beyond their original capabilities. Our metho…
arXiv:2602.18176v3 Announce Type: replace Abstract: Masked Diffusion Models (MDMs) enable flexible decoding orders, yet existing samplers remain largely greedy, selecting locally certain tokens without accounting for their downstream effects. We show that this myopia can increase…
arXiv cs.LG
TIER_1English(EN)·Jaihoon Kim, Taehoon Yoon, Prin Phunyaphibarn, Seungjun Kim, Morteza Mardani, Minhyuk Sung·
arXiv:2605.23346v1 Announce Type: new Abstract: Discrete diffusion models have emerged as powerful frameworks for generating structured categorical data. However, efficiently sampling from reward-tilted distributions remains a fundamental challenge. While Twisted Sequential Monte…
arXiv cs.LG
TIER_1English(EN)·Benjamin Rozonoyer, Jacopo Minniti, Dhruvesh Patel, Neil Band, Avishek Joey Bose, Tim G. J. Rudner, Andrew McCallum·
arXiv:2605.22967v1 Announce Type: new Abstract: When Masked Diffusion Models (MDMs) generate sequences through iterative refinement, the rich internal computation over masked positions is discarded, forcing every subsequent refinement step to recompute the valuable internal infor…
SKILD is a scale-invariant k-space image learning diffusion model that unifies image generation and continuous super-resolution through a single unconditional framework by leveraging scale invariance in image content and physics systems.
B³D-RWKV combines diffusion and RWKV architectures to achieve parallel, bidirectional processing with improved decoding speed while maintaining competitive accuracy.
Visual Concept Fusion enables dual text and image conditioning in diffusion models through feature alignment and fusion strategies without requiring retraining.
arXiv cs.CL
TIER_1English(EN)·Chunsan Hong, Sanghyun Lee, Jong Chul Ye·
arXiv:2602.02112v2 Announce Type: replace-cross Abstract: Masked diffusion models (MDMs) are a potential alternative to autoregressive models (ARMs) for language generation, but generation quality depends critically on the generation order. Prior work either hard-codes an orderin…
arXiv:2605.21568v1 Announce Type: new Abstract: In this work, we extend the Equilibrium Propagation framework to skew-gradient systems and show an equivalence between deep Energy-Based Models and Hamiltonian neural networks. We focus on networks of diffusively coupled Fitzhugh-Na…
arXiv:2511.18159v2 Announce Type: replace Abstract: Masked diffusion models (MDMs) are a promising alternative to autoregressive models (ARMs), but they suffer from inherently much higher training variance. High variance leads to noisier gradient estimates and unstable optimizati…
arXiv:2605.21514v1 Announce Type: cross Abstract: Many complex systems can be modeled by temporal networks, whose organization often evolves through distinct structural phases. Detecting the change points that delimit these phases is both important and challenging. In this work, …
arXiv cs.LG
TIER_1English(EN)·Seo Taek Kong, Weina Wang, R. Srikant·
arXiv:2605.21911v1 Announce Type: new Abstract: We develop a principled framework for analyzing and designing noise schedules in diffusion models. We show that one can recast this design problem as an optimal control problem, whose state is the Fisher information of the diffusion…
arXiv:2605.20235v1 Announce Type: cross Abstract: Diffusion models generate high-dimensional data with remarkable quality, yet how their training efficiently learns the score function, bypassing the curse of dimensionality when data is supported on low-dimensional manifolds, rema…
arXiv cs.AI
TIER_1English(EN)·Chenyang An, Xiaoqian Xu·
arXiv:2605.20623v1 Announce Type: cross Abstract: We establish explicit lower bounds for advection-diffusion equations in three settings: a polynomial $\dot H^{-1}$ bound for inviscid shears with $u\in L^\infty_t W^{1,1}_y$, a uniform positive lower bound on the mixing scale for …
arXiv cs.AI
TIER_1English(EN)·Luca Maria Del Bono, Giulio Biroli, Patrick Charbonneau, Marylou Gabri\'e·
arXiv:2605.12597v2 Announce Type: replace-cross Abstract: Computational sampling has been central to the sciences since the mid-20th century. While machine-learning-based approaches have recently enabled major advances, their behavior remains poorly understood, with limited theor…
arXiv:2605.22586v1 Announce Type: cross Abstract: This tutorial develops diffusion models from the viewpoint of differential equations. We begin with the conditional Gaussian forward process and show that this path admits both an ordinary differential equation (ODE) representatio…
Contrastive Distribution Matching addresses efficient sampling from reward-tilted distributions in discrete diffusion models through learned twist functions that reduce computational overhead while maintaining accuracy across diverse applications.
This tutorial develops diffusion models from the viewpoint of differential equations. We begin with the conditional Gaussian forward process and show that this path admits both an ordinary differential equation (ODE) representation and a stochastic differential equation (SDE) rep…
Diffusion Transformers (DiTs) achieve superior image generation quality but suffer from quadratic computational complexity relative to token count. While various token reduction (TR) methods have been proposed to mitigate this cost, they overlook the primary objective of generati…
Discrete diffusion models are often trained through clean-data prediction, but the prediction can be used in different ways to define the reverse dynamics. In Masked Diffusion Models (MDM) these choices largely coincide, whereas in Uniform Diffusion Models (UDM) they do not. We s…
We establish explicit lower bounds for advection-diffusion equations in three settings: a polynomial $\dot H^{-1}$ bound for inviscid shears with $u\in L^\infty_t W^{1,1}_y$, a uniform positive lower bound on the mixing scale for diffusive shears, and an exponential $L^2$ bound f…
Diffusion models are often trained in low-dimensional latent spaces, which are then reused for related but shifted datasets. In this work, we study when such latent reuse remains reliable under distribution shift. We consider a source-target setting in which both datasets are app…
Diffusion models are often trained in low-dimensional latent spaces, which are then reused for related but shifted datasets. In this work, we study when such latent reuse remains reliable under distribution shift. We consider a source-target setting in which both datasets are app…
Diffusion Language Models (DLMs) provide a promising alternative to autoregressive language models by generating text through iterative denoising and bidirectional refinement. However, this iterative generation paradigm also introduces unique safety vulnerabilities when harmful t…
Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models (ARMs) for language modeling. However, MDMs are known to learn substantially more slowly than ARMs, which may become problematic when scaling MDMs to larger models. Therefore, we ask t…
Drifting Models have emerged as a new paradigm for one-step generative modeling, achieving strong image quality without iterative inference. The premise is to replace the iterative denoising process in diffusion models with a single evaluation of a generator. However, this create…
Diffusion Language Models (DLMs) have recently emerged as a promising alternative to autoregressive language models, offering stronger global awareness and highly parallel generation. However, post-training DLMs with standard Negative Evidence Lower Bound (NELBO)-based supervised…
Diffusion models have achieved remarkable success, yet their training remains inefficient due to a severe optimization bottleneck, which we term Representation Degradation. As noise levels increase, the outputs of the trained model exhibit progressive structural distortion, which…
We propose kernel-gradient drifting, a one-step generative modeling framework that replaces the fixed Euclidean displacement direction in drifting models with directions induced by the kernel itself. Standard drifting is attractive because it enables fast, high-quality generation…
We propose kernel-gradient drifting, a one-step generative modeling framework that replaces the fixed Euclidean displacement direction in drifting models with directions induced by the kernel itself. Standard drifting is attractive because it enables fast, high-quality generation…
Pretrained diffusion models provide powerful learned priors, but in scientific sampling the target distribution often depends on physical context that is not fully represented by one generative model. We introduce Generative Gibbs for Physics-Aware Sampling (GG-PA), a training-fr…
Diffusion large language models (dLLMs) offer a promising route to parallel and efficient text generation, but improving their reasoning ability requires effective post-training. Reinforcement learning with verifiable rewards (RLVR) is a natural choice for this purpose, yet its a…
Erasing specific concepts from text-to-image diffusion models is essential for avoiding the generation of copyrighted and explicit content. Closed-form concept erasure methods offer a fast alternative to backpropagation-based techniques, but they become less effective when scalin…
Masked diffusion language models enable parallel token generation and offer improved decoding efficiency over autoregressive models. However, their performance degrades significantly when generating multiple tokens simultaneously, due to a mismatch between token-level training ob…
Diffusion models perform remarkably well on high-dimensional data such as images, often using only a modest number of reverse-time steps. Despite this practical success, existing convergence theory does not fully explain why such samplers remain efficient in high dimensions. Many…
Latent diffusion models offer an attractive alternative to discrete diffusion for non-autoregressive text generation by operating on continuous text representations and denoising entire sequences in parallel. The major challenge in latent diffusion modeling is constructing a suit…
Classifier-Free Guidance (CFG) is a widely used mechanism for controlling diffusion-based generative models, yet its guidance scale is typically treated as a fixed hyperparameter throughout generation. This static design yields a suboptimal controllability and quality tradeoff, a…
Inference-time controllable generation is essential for real-world applications of unconditional diffusion models. However, most existing techniques focus on individual samples, struggling in applications that require the sample population to follow specific attribute distributio…
arXiv cs.LG
TIER_1English(EN)·Benjamin Sterling, Yousef El-Laham, M\'onica F. Bugallo·
arXiv:2509.14225v3 Announce Type: replace Abstract: Recent advances in generative artificial intelligence applications have raised new data security concerns. This paper focuses on defending diffusion models against membership inference attacks. This type of attack occurs when th…
arXiv:2605.06507v1 Announce Type: cross Abstract: Reinforcement learning fine-tuning has become the dominant approach for aligning diffusion models with human preferences. However, assessing images is intrinsically a multi-dimensional task, and multiple evaluation criteria need t…
arXiv:2605.06172v1 Announce Type: cross Abstract: Many normalizing flow architectures impose regularity constraints, yet their distributional approximation properties are not fully characterized. We study the expressivity of bi-Lipschitz normalizing flows through the lens of scor…
arXiv cs.LG
TIER_1English(EN)·Sankarshana Venugopal (Seoul National University), Mohammad Mostafavi (Seoul National University), Jonghyun Choi (Seoul National University)·
arXiv:2605.05889v1 Announce Type: cross Abstract: Diffusion-based image-to-image (I2I) translation excels in high-fidelity generation but suffers from slow sampling in state-of-the-art Diffusion Bridge Models (DBMs), often requiring dozens of function evaluations (NFEs). We intro…
arXiv:2605.06553v1 Announce Type: new Abstract: We present EDDY (Exact-marginal Diversification via Divergence-free dYnamics), a guidance mechanism for diffusion and flow matching models that promotes diversity among samples generated while maintaining quality. EDDY exploits symm…
arXiv cs.LG
TIER_1English(EN)·Matias G. Delgadino, Sebastien Motsch, Advait Parulekar, William Porteous, Sanjay Shakkottai·
arXiv:2605.06538v1 Announce Type: new Abstract: Diffusion-based posterior samplers use pretrained diffusion priors to sample from measurement- or reward-conditioned posteriors, and are widely used for inverse problems. Yet their theoretical behavior remains poorly understood: eve…
arXiv cs.LG
TIER_1English(EN)·Alexander Conzelmann, Albert Catalan-Tatjer, Shiwei Liu·
arXiv:2605.06366v1 Announce Type: new Abstract: Diffusion language models (DLMs) have recently emerged as competitive alternatives to autoregressive (AR) language models, yet differences in their activation dynamics remain poorly understood. We characterize these dynamics in LLaD…
arXiv cs.LG
TIER_1English(EN)·Eugenio Lomurno, Filippo Balzarini, Francesco Benelle, Francesca Pia Panaccione, Matteo Matteucci·
arXiv:2605.06261v1 Announce Type: new Abstract: Diffusion-based generators set the current state of the art for synthetic tabular data. These methods approach but rarely exceed real-data utility, and closing this synthetic-real gap has so far been pursued exclusively at training …
arXiv cs.CL
TIER_1Français(FR)·Hongcan Guo, Qinyu Zhao, Yian Zhao, Shen Nie, Rui Zhu, Qiushan Guo, Feng Wang, Tao Yang, Hengshuang Zhao, Guoqiang Wei, Yan Zeng·
arXiv:2605.06548v1 Announce Type: new Abstract: Large language models have achieved remarkable success under the autoregressive paradigm, yet high-quality text generation need not be tied to a fixed left-to-right order. Existing alternatives still struggle to jointly achieve gene…
arXiv:2605.05387v1 Announce Type: new Abstract: We study zero-shot conditional sampling with pretrained diffusion models for linear inverse problems, including inpainting and super-resolution. In these problems, the observation determines only part of the unknown signal. The rema…
arXiv:2605.06077v1 Announce Type: new Abstract: This position paper argues that understanding generalization in diffusion models requires fundamentally new theoretical frameworks that go beyond both classical statistical learning theory and the benign overfitting paradigm develop…
arXiv:2605.06169v1 Announce Type: new Abstract: Scaling Diffusion Transformers (DiTs) to hundreds of layers introduces a structural vulnerability: networks can enter a silent, mean-dominated collapse state that homogenizes token representations and suppresses centered variation. …
arXiv cs.LG
TIER_1English(EN)·Tongda Xu, Mingwei He, Shady Abu-Hussein, Jose Miguel Hernandez-Lobato, Chunhang Zheng, Kai Zhao, Chao Zhou, Ya-Qin Zhang, Yan Wang·
arXiv:2603.05630v2 Announce Type: replace-cross Abstract: It is well known that the reconstruction FID (rFID) of a VAE is poorly correlated with the generation FID (gFID) of a latent diffusion model. We propose interpolated FID (iFID), a simple variant of rFID that exhibits a str…
arXiv:2602.00175v2 Announce Type: replace Abstract: Text-to-image diffusion models (DMs) are frequently abused to produce harmful or copyrighted content, violating public interests. Concept erasure (unlearning) is a promising paradigm to alleviate this issue. However, there exist…
We present EDDY (Exact-marginal Diversification via Divergence-free dYnamics), a guidance mechanism for diffusion and flow matching models that promotes diversity among samples generated while maintaining quality. EDDY exploits symmetries of the Fokker-Planck equation, using drif…
Diffusion-based posterior samplers use pretrained diffusion priors to sample from measurement- or reward-conditioned posteriors, and are widely used for inverse problems. Yet their theoretical behavior remains poorly understood: even with exact prior scores, their outputs are bia…
arXiv:2603.16368v2 Announce Type: replace-cross Abstract: Striking a balance between efficiency and transparent motion is a core challenge in human-robot collaboration, as highly expressive movements often incur unnecessary time and energy costs. In collaborative environments, le…
arXiv:2605.04215v1 Announce Type: new Abstract: Diffusion-based Large Language Models (D-LLMs) represent a promising frontier in generative AI, offering fully parallel token generation that can lead to significant throughput advantages and superior GPU utilization over traditiona…
arXiv cs.LG
TIER_1English(EN)·Arthur Gretton, Li Kevin Wenliang, Alexandre Galashov, James Thornton, Valentin De Bortoli, Arnaud Doucet·
arXiv:2605.05118v1 Announce Type: new Abstract: Recently, Deng et al. (2026) proposed Generative Modeling via Drifting (GMD), a novel framework for generative tasks. This note presents an analysis of GMD through the lens of Wasserstein Gradient Flows (WGF), i.e., the path of stee…
arXiv:2605.05024v1 Announce Type: cross Abstract: Hypergraphs model higher-order interactions, but realistic hypergraph generation remains difficult because incidence, hyperedge-size heterogeneity, and overlap structure are not faithfully captured by pairwise reductions. We propo…
arXiv:2605.05206v1 Announce Type: cross Abstract: We study outlier tokens in Diffusion Transformers (DiTs) for image generation. Prior work has shown that Vision Transformers (ViTs) can produce a small number of high-norm tokens that attract disproportionate attention while carry…
arXiv:2510.08431v3 Announce Type: replace-cross Abstract: Although continuous-time consistency models (e.g., sCM, MeanFlow) are theoretically principled and empirically powerful for fast academic-scale diffusion, its applicability to large-scale text-to-image and video tasks rema…
arXiv cs.LG
TIER_1English(EN)·Riccardo de Lutio, Tobias Fischer, Yen-Yu Chang, Yuxuan Zhang, Jay Zhangjie Wu, Xuanchi Ren, Tianchang Shen, Katarina Tothova, Zan Gojcic, Haithem Turki·
arXiv:2603.00492v2 Announce Type: replace-cross Abstract: Per-scene optimization methods such as 3D Gaussian Splatting provide state-of-the-art novel view synthesis quality but extrapolate poorly to under-observed areas. Methods that leverage generative priors to correct artifact…
arXiv cs.LG
TIER_1English(EN)·Andreas Makris, Paul Fearnhead, Chris Nemeth·
arXiv:2605.03712v1 Announce Type: cross Abstract: Training-free conditional diffusion provides a flexible alternative to task-specific conditional model training, but existing samplers often allocate computation inefficiently: independent guided trajectories can vary widely in qu…
arXiv cs.LG
TIER_1English(EN)·Francesca Romana Crucinio·
arXiv:2507.04330v2 Announce Type: replace-cross Abstract: We consider the problem of sampling from a probability distribution $\pi$ which admits a density w.r.t. a dominating measure. It is well known that this can be written as an optimisation problem over the space of probabili…
arXiv:2601.19312v2 Announce Type: replace Abstract: The Schrodinger Bridge and Bass (SBB) formulation, which jointly controls drift and volatility, is an established extension of the classical Schrodinger Bridge (SB). Building on this framework, we introduce LightSBB-M, an algori…
arXiv:2605.02973v1 Announce Type: new Abstract: Modality translation is inherently under-constrained, as multiple cross-modal mappings may yield the same marginals. Recent work has shown that diffusion bridges are effective for this task. However, most existing approaches rely on…
arXiv cs.LG
TIER_1English(EN)·James Rowbottom, Elizabeth L. Baker, Nick Huang, Ben Adcock, Carola-Bibiane Sch\"onlieb, Alexander Denker·
arXiv:2605.03497v1 Announce Type: new Abstract: Score-based diffusion models in infinite-dimensional function spaces provide a mathematically principled framework for modelling function-valued data, offering key advantages such as resolution invariance and the ability to handle i…
arXiv cs.LG
TIER_1English(EN)·Aaron Havens, Brian Karrer, Neta Shaul·
arXiv:2605.03984v1 Announce Type: new Abstract: Sampling from unnormalized densities is analogous to the generative modeling problem, but the target distribution is defined by a known energy function instead of data samples. Because evaluating the energy function is often costly,…
arXiv cs.LG
TIER_1English(EN)·Francisco M. Castro-Mac\'ias, Pablo Morales-\'Alvarez, Saifuddin Syed, Daniel Hern\'andez-Lobato, Rafael Molina, Jos\'e Miguel Hern\'andez-Lobato·
arXiv:2605.04013v1 Announce Type: cross Abstract: Sampling from unnormalized multimodal distributions with limited density evaluations remains a fundamental challenge in machine learning and natural sciences. Successful approaches construct a bridge between a tractable reference …
arXiv cs.LG
TIER_1English(EN)·José Miguel Hernández-Lobato·
Sampling from unnormalized multimodal distributions with limited density evaluations remains a fundamental challenge in machine learning and natural sciences. Successful approaches construct a bridge between a tractable reference and the target distribution. Parallel Tempering (P…
Sampling from unnormalized densities is analogous to the generative modeling problem, but the target distribution is defined by a known energy function instead of data samples. Because evaluating the energy function is often costly, a primary challenge is to learn an efficient sa…
Dataset distillation enables efficient training by distilling the information of large-scale datasets into significantly smaller synthetic datasets. Diffusion based paradigms have emerged in recent years, offering novel perspectives for dataset distillation. However, they typical…
Training-free conditional diffusion provides a flexible alternative to task-specific conditional model training, but existing samplers often allocate computation inefficiently: independent guided trajectories can vary widely in quality, and additional function evaluations along a…
Score-based diffusion models in infinite-dimensional function spaces provide a mathematically principled framework for modelling function-valued data, offering key advantages such as resolution invariance and the ability to handle irregular discretisations. However, practical imp…
arXiv cs.LG
TIER_1English(EN)·Phil Sidney Ostheimer, Mayank Nagda, Andriy Balinskyy, Gabriel Vicente Rodrigues, Jean Radig, Carl Herrmann, Stephan Mandt, Marius Kloft, Sophie Fellenz·
arXiv:2605.01817v1 Announce Type: new Abstract: Diffusion models (DMs) excel on dense continuous data, but are not designed for sparse continuous data. They do not model exact zeros that represent the deliberate absence of a signal. As a result, they erase sparsity patterns and p…
arXiv cs.LG
TIER_1English(EN)·Tongzhen Dang, Weiyang Ding, Michael K. Ng·
arXiv:2605.01691v1 Announce Type: new Abstract: In this paper, we propose Complex Diffusion Maps (CDM), a novel diffusion mapping framework that aims to reveal the dominant complex harmonics of high-dimensional data. Inspired by the local Gaussian kernel relevant to the heat equa…
arXiv:2605.00161v1 Announce Type: new Abstract: Diffusion language models (DLMs) are an attractive alternative to autoregressive models because they promise sublinear-time, parallel generation, yet practical gains remain elusive as high-quality samples still demand hundreds of re…
arXiv cs.LG
TIER_1English(EN)·Carles Domingo-Enrich, Yuanqi Du, Michael S. Albergo·
arXiv:2605.00229v1 Announce Type: cross Abstract: We study the problem of training diffusion and flow generative models to sample from target distributions defined by an exponential tilting of a base density; a formulation that subsumes both sampling from unnormalized densities a…
arXiv:2509.20098v2 Announce Type: replace Abstract: Learning physical dynamics from data is a fundamental challenge in machine learning and scientific modeling. Real-world observational data are inherently incomplete and irregularly sampled, posing significant challenges for exis…
arXiv cs.AI
TIER_1English(EN)·Yonggan Fu, Lexington Whalen, Zhifan Ye, Xin Dong, Shizhe Diao, Jingyu Liu, Chengyue Wu, Hao Zhang, Enze Xie, Song Han, Maksim Khadkevich, Jan Kautz, Yingyan Celine Lin, Pavlo Molchanov·
arXiv:2512.14067v2 Announce Type: replace-cross Abstract: Diffusion language models (dLMs) have emerged as a promising paradigm that enables parallel, non-autoregressive generation, but their learning efficiency lags behind that of autoregressive (AR) language models when trained…
arXiv cs.AI
TIER_1English(EN)·Michael Cardei, Huu Binh Ta, Ferdinando Fioretto·
arXiv:2604.26985v1 Announce Type: cross Abstract: Masked diffusion models (MDMs) generate discrete sequences by iterative denoising under an absorbing masking process. In standard masked diffusion, if a token remains masked after a reverse update, the model discards its clean-sta…
arXiv:2604.27443v1 Announce Type: cross Abstract: Generating continuous-time, continuous-space stochastic processes (e.g., videos, weather forecasts) conditioned on partial observations (e.g., first and last frames) is a fundamental challenge. Existing approaches, (e.g., diffusio…
arXiv cs.LG
TIER_1English(EN)·Hyukjun Lim, Soojung Yang, Lucas Pin\`ede, Miguel Steiner, Yuanqi Du, Rafael G\'omez-Bombarelli·
arXiv:2603.25980v2 Announce Type: replace-cross Abstract: Transition states, the first-order saddle points on the potential energy surfaces, govern the kinetics and mechanisms of chemical reactions and conformational changes. Locating them is challenging because transition pathwa…
arXiv:2510.18165v3 Announce Type: replace-cross Abstract: Diffusion language models (DLMs) are emerging as a compelling alternative to the dominant autoregressive paradigm, offering inherent advantages in parallel generation and bidirectional context modeling. However, for the ta…
arXiv:2603.20645v2 Announce Type: replace Abstract: Diffusion models have become a leading framework in generative modeling, yet their theoretical understanding -- especially for high-dimensional data concentrated on low-dimensional structures -- remains incomplete. This paper in…
arXiv:2604.24832v1 Announce Type: new Abstract: Masked diffusion language models (MDMs) have recently emerged as a promising alternative to standard autoregressive large language models (AR-LLMs), yet their optimization can be substantially less stable. We study blockwise MDMs an…
Practically, training diffusion models typically requires explicit time conditioning to guide the network through the denoising sampling process. Especially in deterministic methods like DDIM, the absence of time conditioning leads to significant performance degradation. However,…
arXiv:2604.18471v2 Announce Type: replace Abstract: Discrete diffusion language models (dLLMs) have recently emerged as a promising alternative to traditional autoregressive approaches, offering the flexibility to generate tokens in arbitrary orders and the potential of parallel …
arXiv cs.LG
TIER_1English(EN)·Yiming Zhang, Sitong Liu, Ke Li, Zhihong Wu, Alex Cloninger, Melvin Leok·
arXiv:2604.24238v1 Announce Type: new Abstract: Diffusion models are a leading paradigm for data generation, but training-free editing typically re-runs the full denoising trajectory for every edit strength, making iterative refinement expensive. To address this issue, we instead…
arXiv:2604.24357v1 Announce Type: new Abstract: Diffusion language models generate without a fixed left-to-right order, making token ordering a central algorithmic choice: which tokens should be revealed, retained, revised or verified at each step? Existing systems mainly use ran…
arXiv:2604.23806v1 Announce Type: new Abstract: The reverse process in score-based diffusion models is formally equivalent to overdamped Langevin dynamics in a time-dependent energy landscape. In our prior work we showed that a bilinearly-coupled analog substrate can physically r…
arXiv:2604.22901v1 Announce Type: new Abstract: Diffusion models achieve remarkable success in time series generation. However, slow inference limits their practical deployment. We propose E$^2$-CRF (Error-Feedback Event-Driven Cumulative Residual Feature caching) to accelerate f…
arXiv cs.LG
TIER_1English(EN)·Weiguo Gao, Ming Li·
arXiv:2505.16024v2 Announce Type: replace Abstract: Diffusion trajectory distillation accelerates sampling by training a student model to approximate the multi-step denoising trajectories of a pretrained teacher model using far fewer steps. Despite strong empirical results, the t…
arXiv:2603.19670v3 Announce Type: replace Abstract: Nonasymptotic diffusion analyses often decompose sampling error into score estimation, continuous reverse-time propagation, discretization, and terminal conversion. We isolate the propagation module on certified scalar-isotropic…
Diffusion language models generate without a fixed left-to-right order, making token ordering a central algorithmic choice: which tokens should be revealed, retained, revised or verified at each step? Existing systems mainly use random masking or confidence-driven ordering. Rando…
Diffusion models are a leading paradigm for data generation, but training-free editing typically re-runs the full denoising trajectory for every edit strength, making iterative refinement expensive. To address this issue, we instead edit near the data manifold, where small local …
arXiv:2603.20092v4 Announce Type: replace Abstract: Diffusion models generate structure by progressively transforming noise into data, yet the mechanisms underlying this transition remain poorly understood. In this work, we show that pattern formation in trained diffusion models …
arXiv stat.ML
TIER_1English(EN)·Ziao Wang, Lei Ying·
arXiv:2606.12879v1 Announce Type: cross Abstract: This paper studies a variation of the classic network alignment problem, named diffusion-network alignment. The goal is to align the vertices of a rooted diffusion tree to the vertices of a network, where the diffusion tree could …
arXiv cs.CV
TIER_1English(EN)·Lidia Troeshestova, Alexander Ustyuzhanin, Sergey Kastryulin·
arXiv:2606.13303v1 Announce Type: new Abstract: Recent diffusion editors perform diverse instruction-based edits while conditioning on the source image at every denoising step. Yet persistent source-image conditioning can limit how fully an edit is executed and how natural the re…
Recent diffusion editors perform diverse instruction-based edits while conditioning on the source image at every denoising step. Yet persistent source-image conditioning can limit how fully an edit is executed and how natural the result appears, especially when the target scene d…
A key strength of diffusion models lies in their flexibility, since their outputs can be controlled at sampling time through guidance. However, beyond simple cases such as conditional sampling, the target distribution is often left implicit, defined only through a sampling rule o…
Model fingerprinting, embedding user-specific identifiers (fingerprints) into generated outputs, has recently emerged as a popular solution to protect the intellectual property rights (IPR) of generative text-to-image (T2I) models and prevent unauthorized redistribution. In this …
This paper studies a variation of the classic network alignment problem, named diffusion-network alignment. The goal is to align the vertices of a rooted diffusion tree to the vertices of a network, where the diffusion tree could be from a communication trace or contact tracing, …
arXiv stat.ML
TIER_1English(EN)·Lorenzo Bardone, Claudia Merger, Sebastian Goldt·
arXiv:2603.12901v2 Announce Type: replace Abstract: While diffusion models have emerged as a powerful class of generative models, their learning dynamics remain poorly understood. We address this issue first by empirically showing that standard diffusion models trained on natural…
arXiv:2606.12263v1 Announce Type: new Abstract: While Latent Diffusion Models (LDMs) have revolutionized visual synthesis, they are increasingly exploited for unauthorized mimicry of individuals. Existing defenses inject deceptive perturbations to steer the generated images towar…
arXiv cs.CV
TIER_1English(EN)·Ruitong Sun, Tianze Yang, Wei Niu, Jin Sun·
arXiv:2512.14096v2 Announce Type: replace Abstract: Diffusion Transformers (DiTs) have achieved remarkable success in image generation, yet their deployment is hindered by high computational costs. We identify two sources of redundancy. First, temporal redundancy: Classifier-Free…
While Latent Diffusion Models (LDMs) have revolutionized visual synthesis, they are increasingly exploited for unauthorized mimicry of individuals. Existing defenses inject deceptive perturbations to steer the generated images toward irrelevant targets. However, this approach hin…
arXiv stat.ML
TIER_1English(EN)·Dennis Elbr\"achter, Giovanni S. Alberti, Matteo Santacesaria·
arXiv:2509.24710v2 Announce Type: replace Abstract: Score-based diffusion models are a highly effective method for generating samples from a distribution of images. We consider scenarios where the training data comes from a noisy version of the target distribution, and present an…
arXiv cs.CV
TIER_1English(EN)·Zhengxuan Wei, Yi Dong, Zonghui Li, Xianhui Lin, Xing Liu, Hong Gu, Shaofeng Zhang, Wenbin Li, Qi Fan·
arXiv:2606.10617v1 Announce Type: new Abstract: Low-Rank Adaptation (LoRA) merging can efficiently combine diverse generative capabilities from multiple trained LoRAs for a diffusion model. However, existing LoRA merging techniques often suffer from severe parameter interference,…
arXiv stat.ML
TIER_1English(EN)·Zahra Kadkhodaie, Aram-Alexandre Pooladian, Sinho Chewi, Eero Simoncelli·
arXiv:2602.09639v2 Announce Type: replace-cross Abstract: Denoising diffusion models (DDMs) are state-of-the-art methods for learning densities from data across numerous domains, yet many aspects of the training and sampling pipeline remain poorly understood. In particular, noise…
Low-Rank Adaptation (LoRA) merging can efficiently combine diverse generative capabilities from multiple trained LoRAs for a diffusion model. However, existing LoRA merging techniques often suffer from severe parameter interference, causing destructive collisions in the shared pa…
arXiv stat.ML
TIER_1English(EN)·Al Zadid Sultan Bin Habib, Md Younus Ahamed, Prashnna Gyawali, Gianfranco Doretto, Donald A. Adjeroh·
arXiv:2606.09257v1 Announce Type: cross Abstract: High-Dimensional Low-Sample Size (HDLSS) tabular domains (e.g., omics) are characterized by $n \ll m$, where $n$ = number of samples, and $m$ = number of features. Such domains often exhibit strong local correlation groups, sparse…
arXiv cs.CV
TIER_1English(EN)·Lianyu Pang, Tianlin Pan, Cheng Da, Changqian Yu, Huan Yang, Kun Gai, Song Guo, Wenhan Luo·
arXiv:2606.08788v1 Announce Type: new Abstract: Representation alignment with pretrained vision models has recently shown strong potential for accelerating diffusion transformer training. By aligning intermediate diffusion features with clean-image representations from self-super…
arXiv:2606.08438v1 Announce Type: new Abstract: Bayesian optimization (BO) is a widely used approach for black-box optimization that uses a Gaussian process (GP) as a surrogate and guides sequential evaluations via an acquisition function, with the ultimate goal of locating the g…
arXiv:2509.24531v2 Announce Type: replace Abstract: Diffusion Bridge and Flow Matching have both demonstrated compelling empirical performance in transformation between arbitrary distributions. However, there remains confusion about which approach is generally preferable, and the…
arXiv stat.ML
TIER_1English(EN)·Donald A. Adjeroh·
High-Dimensional Low-Sample Size (HDLSS) tabular domains (e.g., omics) are characterized by $n \ll m$, where $n$ = number of samples, and $m$ = number of features. Such domains often exhibit strong local correlation groups, sparse cross-group dependencies, heavy-tailed non-Gaussi…
Bayesian optimization (BO) is a widely used approach for black-box optimization that uses a Gaussian process (GP) as a surrogate and guides sequential evaluations via an acquisition function, with the ultimate goal of locating the global optimum $\mathbf{x}^{\star}$. To align wit…
arXiv stat.ML
TIER_1English(EN)·Hongfan Gao, Wangmeng Shen, Bin Yang, Jilin Hu·
arXiv:2606.05239v1 Announce Type: new Abstract: Diffusion models have demonstrated strong performance in time series modeling due to their ability to progressively capture complex data distributions through iterative denoising. However, existing approaches struggle with frequency…
arXiv cs.CV
TIER_1English(EN)·Weiyan Chen, Weijian Deng, Yao Xiao, Weijie Tu, ZiYi Dong, Ibrahim Radwan, Liang Lin, Pengxu Wei·
arXiv:2605.19839v2 Announce Type: replace Abstract: Preference alignment aims to guide generative models by learning from comparisons between preferred and non-preferred samples. In practice, most existing approaches rely on preference pairs constructed from model-generated image…
arXiv:2606.06477v1 Announce Type: new Abstract: Standard continuous-time generative models rely on monolithic architectures that must navigate vastly different signal regimes, from isotropic noise to intricate data distributions. While scaling model capacity improves performance,…
arXiv:2606.06120v1 Announce Type: new Abstract: Contrastive Analysis aims to separate factors that are common between two data distributions from those that are salient to only one of them. Existing contrastive methods are based on generative models (e.g., VAEs or GANs) that ofte…
arXiv:2606.05883v1 Announce Type: new Abstract: Dataset condensation aims to construct compact datasets from real data via synthesis or selection. However, existing approaches are ill-suited for diffusion model training: synthetic data generation often yields low-fidelity samples…
arXiv:2510.10968v3 Announce Type: replace-cross Abstract: Derivative-free Bayesian inversion arises in science and engineering applications, particularly when forward model is costly or infeasible to differentiate through. Existing derivative-free methods collapse the posterior t…
arXiv stat.ML
TIER_1English(EN)·Na\"il B. Khelifa, Richard E. Turner, Ramji Venkataramanan·
arXiv:2606.06179v1 Announce Type: new Abstract: Score-based diffusion models are typically trained by minimizing the $L^2$ score matching error, and standard theoretical analyses rely on this quantity to bound the sampling discrepancy between the learned and target distributions.…
Standard continuous-time generative models rely on monolithic architectures that must navigate vastly different signal regimes, from isotropic noise to intricate data distributions. While scaling model capacity improves performance, deploying a massive network uniformly across th…
Score-based diffusion models are typically trained by minimizing the $L^2$ score matching error, and standard theoretical analyses rely on this quantity to bound the sampling discrepancy between the learned and target distributions. We show the $L^2$ score error is not the right …
Contrastive Analysis aims to separate factors that are common between two data distributions from those that are salient to only one of them. Existing contrastive methods are based on generative models (e.g., VAEs or GANs) that often suffer from limited reconstruction and image q…
Dataset condensation aims to construct compact datasets from real data via synthesis or selection. However, existing approaches are ill-suited for diffusion model training: synthetic data generation often yields low-fidelity samples unsuitable for authentic modeling, while real s…
arXiv:2606.04324v1 Announce Type: cross Abstract: One of the primary challenges in Bayesian inference on the parameters of a diffusion model from discrete observations is the unavailability of an analytical expression for the transition density function between consecutive observ…
Diffusion models have demonstrated strong performance in time series modeling due to their ability to progressively capture complex data distributions through iterative denoising. However, existing approaches struggle with frequency-sensitive denoising, high-frequency reconstruct…
arXiv:2606.03111v1 Announce Type: new Abstract: This paper studies the problem of inverting the DDIM image generation process to recover latent variables, particularly the initial noise map, from a generated image. Existing methods often struggle with accuracy in this task. We pr…
arXiv:2606.03578v1 Announce Type: new Abstract: Latent diffusion models leverage visual tokenizers to compress images into latent spaces for efficient generative modeling. However, better reconstruction quality of a tokenizer does not necessarily translate into better generation …
arXiv:2606.03903v1 Announce Type: new Abstract: Diffusion-weighted imaging (DWI) is used for whole-body cancer screening, but it typically requires a long acquisition time. When the scan time is reduced, the image quality often suffers, leading to increased noise in the scans. Ma…
arXiv:2601.22443v2 Announce Type: replace-cross Abstract: Can a diffusion model trained on bedrooms recover human faces? Diffusion models are widely used as priors for inverse problems, but standard approaches usually assume a high-fidelity model trained on data that closely matc…
arXiv stat.ML
TIER_1English(EN)·Jungkyu Kim, Taeyoung Park, Kibok Lee·
arXiv:2606.03347v1 Announce Type: cross Abstract: Score-based diffusion models have emerged as prominent deep generative models; however, their application to tabular data remains challenging because their backbones assume fully specified inputs, whereas real-world tabular data o…
arXiv:2407.05312v2 Announce Type: replace Abstract: Diffusion models have demonstrated impressive image generation capabilities. Personalized approaches, such as textual inversion and Dreambooth, enhance model individualization using specific images. These methods enable generati…
arXiv:2604.06052v2 Announce Type: replace Abstract: Text-to-image diffusion models exhibit remarkable generative capabilities, yet their internal operations remain opaque, particularly when handling prompts that are not fully descriptive. In such scenarios, models must make impli…
arXiv stat.ML
TIER_1English(EN)·Weiguo Gao, Ming Li, Lei Shi, Hanfei Zhou·
arXiv:2606.03820v1 Announce Type: new Abstract: We develop a quantitative approximation framework for diffusion distillation, viewing few-step sampling as error propagation under compositions of learned flow maps. Focusing on trajectory distillation for the probability-flow ODE, …
One of the primary challenges in Bayesian inference on the parameters of a diffusion model from discrete observations is the unavailability of an analytical expression for the transition density function between consecutive observation times, which is needed to derive the likelih…
Diffusion-weighted imaging (DWI) is used for whole-body cancer screening, but it typically requires a long acquisition time. When the scan time is reduced, the image quality often suffers, leading to increased noise in the scans. Magnitude reconstruction in DWI introduces signal-…
We develop a quantitative approximation framework for diffusion distillation, viewing few-step sampling as error propagation under compositions of learned flow maps. Focusing on trajectory distillation for the probability-flow ODE, we show that local approximation errors can be s…
Latent diffusion models leverage visual tokenizers to compress images into latent spaces for efficient generative modeling. However, better reconstruction quality of a tokenizer does not necessarily translate into better generation quality, suggesting that latent representations …
Score-based diffusion models have emerged as prominent deep generative models; however, their application to tabular data remains challenging because their backbones assume fully specified inputs, whereas real-world tabular data often contain missing values. We propose AugMask, a…
arXiv:2605.09503v2 Announce Type: replace Abstract: Large-scale visual generative models have achieved remarkable performance. However, their high computational and memory costs make deployment challenging in resource-constrained scenarios, such as interactive applications and pe…
arXiv:2606.01048v1 Announce Type: new Abstract: We propose Decoupled Residual Denoising Diffusion models (DRDD) for unified and data-efficient image-to-image (I2I) translation. While diffusion models have advanced I2I translation in terms of quality and diversity, we uncover a pr…
arXiv:2606.01645v1 Announce Type: new Abstract: Diffusion models have emerged as a leading framework for deep generative modeling. While the standard Gaussian formulation is theoretically convenient, its suitability for heavy-tailed datasets remains unclear. To address this, heav…
arXiv:2606.02115v1 Announce Type: new Abstract: Parameter estimation in stochastic differential equations is a classical statistical problem of much importance in many scientific fields. Recent work of Tapia Costa et al. (2026) introduced a novel technique for estimating the drif…
arXiv stat.ML
TIER_1English(EN)·Florian Handke, Dejan Stan\v{c}evi\'c, Felix Koulischer, Thomas Demeester, Luca Ambrogioni·
arXiv:2602.09651v2 Announce Type: replace Abstract: Diffusion models do not recover semantic structure uniformly over time. Instead, samples transition from semantic ambiguity to class commitment within a narrow regime. Recent theoretical work attributes this transition to dynami…
arXiv stat.ML
TIER_1English(EN)·Kijung Jeon, Michael Muehlebach, Molei Tao·
arXiv:2604.17838v2 Announce Type: replace-cross Abstract: Generative modeling within constrained sets is essential for scientific and engineering applications involving physical, geometric, or safety requirements (e.g., molecular generation, robotics). We present a unified framew…
arXiv cs.CV
TIER_1English(EN)·Tao Wu, Senmao Li, Yaxing Wang, Shiqi Yang, Kai Wang, Joost van de Weijer·
arXiv:2606.01380v1 Announce Type: new Abstract: In this work, we introduce a novel training-free inversion (TFinv) framework for one-step diffusion models,addressing key challenges in real image inversion and editing. We first identify two critical factors hamperingreal-image inv…
arXiv:2606.02090v1 Announce Type: new Abstract: Diffusion transformer (DiT) has been widely adopted in the generative diffusion field, advancing the denoising of query tokens through attention and Feed-Forward (\text{FFN}) layers. FFN actually acts as the key-value vocabulary for…
Parameter estimation in stochastic differential equations is a classical statistical problem of much importance in many scientific fields. Recent work of Tapia Costa et al. (2026) introduced a novel technique for estimating the drift when the diffusion parameter is known, using d…
Diffusion transformer (DiT) has been widely adopted in the generative diffusion field, advancing the denoising of query tokens through attention and Feed-Forward (\text{FFN}) layers. FFN actually acts as the key-value vocabulary for decoding visual contents where the value embeds…
arXiv:2605.31162v1 Announce Type: new Abstract: Unconditional diffusion models offer powerful generative priors, yet steering them toward aesthetically enhanced outputs remains largely unexplored. We show that h-space patching, the dominant paradigm for training-free diffusion ed…
arXiv:2605.30825v1 Announce Type: cross Abstract: Unlearning in diffusion models aims to remove undesirable data or concepts while preserving the utility of pretrained models -- two fundamentally conflicting objectives. We propose a principled constrained optimization framework t…
arXiv stat.ML
TIER_1English(EN)·Nail B. Khelifa, Richard E. Turner, Ramji Venkataramanan·
arXiv:2602.16601v2 Announce Type: replace Abstract: Machine learning models are increasingly trained or fine-tuned on synthetic data. Recursively training on such data has been observed to significantly degrade performance in a wide range of tasks, often characterized by a progre…
arXiv cs.CV
TIER_1English(EN)·Nathan Kessler, Robin Magnet, Jean Feydy·
arXiv:2507.06161v2 Announce Type: replace Abstract: Smoothing a signal based on local neighborhoods is a core operation in machine learning and geometry processing. On well-structured domains such as vector spaces and manifolds, the Laplace operator derived from differential geom…
Diffusion models have emerged as a leading framework for deep generative modeling. While the standard Gaussian formulation is theoretically convenient, its suitability for heavy-tailed datasets remains unclear. To address this, heavy-tailed diffusion models (HTDMs) extend the sta…
Unconditional diffusion models offer powerful generative priors, yet steering them toward aesthetically enhanced outputs remains largely unexplored. We show that h-space patching, the dominant paradigm for training-free diffusion editing, systematically fails for global, low-leve…
Unlearning in diffusion models aims to remove undesirable data or concepts while preserving the utility of pretrained models -- two fundamentally conflicting objectives. We propose a principled constrained optimization framework that formulates unlearning as minimizing the deviat…
arXiv stat.ML
TIER_1English(EN)·Tassilo Schwarz, Cai Dieball, Constantin Kogler, Renaud Lambiotte, Arnaud Doucet, Alja\v{z} Godec, George Deligiannidis·
arXiv:2510.08535v2 Announce Type: replace Abstract: Diffusion models are central to generative modeling and have been adapted to graphs by diffusing adjacency matrix representations. The challenge of having up to $n!$ such representations for graphs with $n$ nodes is only partial…
arXiv:2605.28962v1 Announce Type: new Abstract: Diffusion bridge models offer a powerful framework for connecting two data distributions, such as in image restoration and translation. Many existing methods learn this bridge by mimicking the score-matching formulation of standard …
arXiv:2605.30153v1 Announce Type: new Abstract: Score-based diffusion models have demonstrated remarkable empirical success in learning high-dimensional distributions, particularly those exhibiting low-dimensional and multi-modal structures. However, theoretical understanding of …
arXiv:2605.30332v1 Announce Type: new Abstract: Diffusion models achieve state-of-the-art image synthesis, with their generative trajectories fundamentally exhibiting a spectral bias, resolving low-frequency global structures early and high-frequency fine details later. Conventio…
arXiv:2512.10401v3 Announce Type: replace Abstract: This paper is concerned with differentiable resampling in the context of sequential Monte Carlo (e.g., particle filtering). Drawing on reparametrisation, we propose a new resampling method that is informative and instantly diffe…
Diffusion models achieve state-of-the-art image synthesis, with their generative trajectories fundamentally exhibiting a spectral bias, resolving low-frequency global structures early and high-frequency fine details later. Conventional stochastic differential equation (SDE) solve…
Score-based diffusion models have demonstrated remarkable empirical success in learning high-dimensional distributions, particularly those exhibiting low-dimensional and multi-modal structures. However, theoretical understanding of their statistical efficiency remains limited. Ex…
Forecasting the progression of neurodegenerative diseases, such as Parkinson's disease, is essential for effective long-term planning and personalized therapeutic intervention. Existing systems typically produce scalar clinical scores that ignore the rich structure of longitudina…
Large-scale text-to-image (T2I) diffusion models have enabled unprecedented creative applications, but their unauthorized use has raised serious intellectual property concerns, making model ownership verification (MOV) increasingly critical. We find that existing backdoor-based d…
arXiv:2512.06797v2 Announce Type: replace-cross Abstract: Several problems in machine learning are naturally expressed as the design and analysis of time-evolving probability distributions. This includes sampling via diffusion methods, optimizing the weights of neural networks, a…
arXiv cs.CV
TIER_1Deutsch(DE)·Sean Man, Gilad Deutch, Roy Ganz, Roi Ronen, Shahar Tsiper, Shai Mazor, Niv Nayman·
arXiv:2602.16872v2 Announce Type: replace Abstract: Optical Character Recognition (OCR) is a fundamental task for digitizing information, serving as a critical bridge between visual data and textual understanding. While modern Vision-Language Models (VLM) have achieved high accur…
arXiv cs.CV
TIER_1English(EN)·Shangwen Zhu, Han Zhang, Zhantao Yang, Qianyu Peng, Zhao Pu, Huangji Wang, Fan Cheng·
arXiv:2503.09675v3 Announce Type: replace Abstract: Text-based diffusion models have made significant breakthroughs in generating high-quality images and videos from textual descriptions. However, the lengthy sampling time of the denoising process remains a significant bottleneck…
Diffusion posterior sampling conditions diffusion priors on measurements, but data-consistency updates are typically scaled by hand-tuned guidance weights and can destabilize sampling under stiff, operator-dependent curvature. We replace scalar guidance with a per-noise-level dam…
arXiv:2605.27352v1 Announce Type: cross Abstract: Discrete diffusion models have achieved strong empirical performance in text and other symbolic domains, but, especially for uniform-rate models, they often require many steps to generate a single sample. Existing acceleration met…
arXiv stat.ML
TIER_1English(EN)·Hyunmo Kang, Noam Itzhak Levi, Corinna Elena Wegner, Daniel J. Korchinski, Matthieu Wyart·
arXiv:2605.27006v1 Announce Type: cross Abstract: Sampling from learned high-dimensional distributions is a foundational computational problem. We introduce U-turn chains: Markov chains obtained by iterating short forward-backward steps of a diffusion model, in which each step pr…
arXiv cs.CV
TIER_1English(EN)·Junseo Bang, Dong Ju Mun, Hoigi Seo, Seongmin Hong, Se Young Chun·
arXiv:2605.26470v1 Announce Type: new Abstract: Generative posterior sampling using diffusion models has emerged as a dominant paradigm for solving inverse problems in imaging, which usually consists of three main components: data consistency (DC) guidance, classifier-free guidan…
arXiv cs.CV
TIER_1(AF)·Felix Krause, Stefan Andreas Baumann, Johannes Schusterbauer, Olga Grebenkova, Ming Gui, Vincent Tao Hu, Bj\"orn Ommer·
arXiv:2601.01608v2 Announce Type: replace Abstract: Diffusion models deliver high quality in image synthesis but remain expensive during training and inference. Recent works have leveraged the inherent redundancy in visual content to make training more affordable by training only…
Discrete diffusion models have achieved strong empirical performance in text and other symbolic domains, but, especially for uniform-rate models, they often require many steps to generate a single sample. Existing acceleration methods either rely on training additional quantities…
Sampling from learned high-dimensional distributions is a foundational computational problem. We introduce U-turn chains: Markov chains obtained by iterating short forward-backward steps of a diffusion model, in which each step proposes a move that remains on the learned data man…
arXiv:2601.20273v2 Announce Type: replace-cross Abstract: Diffusion Transformers (DiTs) have gained increasing adoption in high-quality image and video generation. As demand for higher-resolution images and longer videos increases, single-GPU inference becomes inefficient due to …
arXiv:2605.25042v1 Announce Type: new Abstract: Existing score-based methods for inverse problems often resort to approximate minimization of the KL divergence between the inversion distribution and the Bayesian posterior. Such an approximation leads to severe mode collapse and u…
arXiv:2605.24870v1 Announce Type: new Abstract: Diffusion Transformers require repeated denoiser evaluations during iterative sampling, making inference computationally expensive. Cache-based acceleration reduces this cost by reusing intermediate representations across denoising …
arXiv:2605.25191v1 Announce Type: new Abstract: Text-to-image diffusion models like Stable Diffusion generate high-quality images from text, but lack a way to inject visual guidance (e.g. sketches, styles) at inference without retraining. Existing methods either require computati…
arXiv:2605.23451v1 Announce Type: new Abstract: Real-world image super-resolution aims to recover high-quality images from complex and unknown real-world degradations. However, existing generative Real-ISR methods largely inherit the dense latent representations and quadratic-cos…
Diffusion models are increasingly used as powerful conditional generators, yet real deployments often involve multiple target distributions arising from different tasks, e.g., diverse prompt domains in text-to-image generation, or multiple environments in robotics with diffusion …
We study inference-time alignment for diffusion-based generative models, aiming to steer a base model toward high-reward outputs without updating its weights. Recent Sequential Monte Carlo (SMC)-based steering methods approximate reward-tilted target distributions in a principled…
Real-world image super-resolution aims to recover high-quality images from complex and unknown real-world degradations. However, existing generative Real-ISR methods largely inherit the dense latent representations and quadratic-cost global modeling paradigm developed for high-re…
arXiv:2605.22011v1 Announce Type: new Abstract: Diffusion Transformers (DiTs) achieve superior image generation quality but suffer from quadratic computational complexity relative to token count. While various token reduction (TR) methods have been proposed to mitigate this cost,…
arXiv cs.CV
TIER_1English(EN)·Le Zhang, Ning Mang, Aishwarya Agrawal·
arXiv:2605.21981v1 Announce Type: new Abstract: Flow matching with $x$-prediction -- regressing the clean data point rather than the ambient velocity -- is known to exploit low-dimensional manifold structure effectively in pixel space \cite{li2025back}. We ask whether a pretraine…
Pretrained diffusion models serve as frozen teachers feeding downstream pipelines such as text-to-3D, single-step distillation, and data attribution. The teacher gradients these pipelines consume are Monte Carlo (MC) expectations over noise levels and Gaussian noise samples; thei…
arXiv stat.ML
TIER_1English(EN)·Benjamin Sterling, M\'onica F. Bugallo, Tom Tirer·
arXiv:2605.19170v1 Announce Type: new Abstract: Diffusion/score-based models have emerged as powerful generative models, capable of generating high-quality samples that mimic the training data distribution. However, it has been observed that they are prone to reproducing training…
arXiv:2605.19391v1 Announce Type: new Abstract: Diffusion models have achieved remarkable success in generating samples from unknown data distributions. Most popular stochastic differential equation-based diffusion models perturb the target distribution by adding Gaussian noise, …
Diffusion models have achieved remarkable success in generating samples from unknown data distributions. Most popular stochastic differential equation-based diffusion models perturb the target distribution by adding Gaussian noise, transforming it into a simple prior, and then us…
arXiv stat.ML
TIER_1English(EN)·Kelvin Kan, Xingjian Li, Benjamin J. Zhang, Tuhin Sahai, Stanley Osher, Markos A. Katsoulakis·
arXiv:2605.17232v1 Announce Type: cross Abstract: Discrete diffusion has become a leading framework for generative modeling in various applications including language, vision, and biology. Existing convergence theory, however, exhibits fundamental limitations. KL-based analyses d…
arXiv:2605.17850v1 Announce Type: new Abstract: iffusion-based generative models increasingly rely on inference-time guidance, adding a drift term or reweighting mixture of experts, to improve sample quality on task-specific objectives. However, most existing techniques require r…
Diffusion/score-based models have emerged as powerful generative models, capable of generating high-quality samples that mimic the training data distribution. However, it has been observed that they are prone to reproducing training samples-known as "memorization"-potentially vio…
iffusion-based generative models increasingly rely on inference-time guidance, adding a drift term or reweighting mixture of experts, to improve sample quality on task-specific objectives. However, most existing techniques require repeated score or gradient evaluations, introduci…
arXiv stat.ML
TIER_1English(EN)·Markos A. Katsoulakis·
Discrete diffusion has become a leading framework for generative modeling in various applications including language, vision, and biology. Existing convergence theory, however, exhibits fundamental limitations. KL-based analyses diverge under singular priors such as the masked di…
Despite strong image-generation performance, diffusion models' reconstruction objectives limit alignment with human preferences. RL enables such alignment through explicit rewards. However, most studies apply RL to the full denoising trajectory, making it computationally costly a…
arXiv:2605.13448v1 Announce Type: new Abstract: Diffusion models are often trained in low-dimensional latent spaces, which are then reused for related but shifted datasets. In this work, we study when such latent reuse remains reliable under distribution shift. We consider a sour…
arXiv:2602.06021v2 Announce Type: replace Abstract: We study a data-dependent notion of diffusion-model generalization: when a model does not memorize the training set, where do its generated samples go relative to the geometry induced by the data? To answer this, we introduce a …
arXiv:2602.02791v2 Announce Type: replace Abstract: We study supervised multiclass classification for diffusion processes, where each class is characterized by a distinct drift function and trajectories are observed at discrete times. We first derive a multidimensional Bayes rule…
arXiv:2510.18114v3 Announce Type: replace-cross Abstract: Discrete diffusion models have emerged as a powerful class of models and a promising route to fast language generation, but practical implementations typically rely on factored reverse transitions ignoring cross-token depe…
Action diffusion excels at high-fidelity action generation but incurs heavy computational costs owing to its iterative denoising nature. Despite current technologies showing promise in accelerating diffusion transformers by reusing the cached features, they struggle to adapt to p…
We present a covariance-aware sampler that improves the quality of pixel-space Diffusion Model (DM) sampling in the few-step regime. We hypothesize that in the few-step regime samplers fail because they rely solely on the predicted mean of the reverse distribution, while our solu…
arXiv:2506.23619v2 Announce Type: replace-cross Abstract: This paper investigates the impact of posterior drift on out-of-sample forecasting accuracy in overparametrized machine learning models. We document the loss in performance when the loadings of the data generating process …
arXiv stat.ML
TIER_1English(EN)·Hao Chen, Renzheng Zhang, Scott S. Howard·
arXiv:2511.17038v3 Announce Type: replace-cross Abstract: From a Bayesian perspective, score-based diffusion solves inverse problems through joint inference, embedding the likelihood with the prior to guide the sampling process. However, this formulation fails to explain its prac…
arXiv:2605.11311v1 Announce Type: cross Abstract: Diffusion models typically generate image batches from independent Gaussian initial noises. We argue that this independence assumption is only one choice within a broader class of valid joint noise designs. Instead, one can specif…
Recent advances in generative models have yielded impressive progress on motion in-betweening, allowing for more complex, varied, and realistic motion transitions. However, recent methods still exhibit noticeable limitations in preserving keyframe information and ensuring motion …
Unlearning specific concepts in text-to-image diffusion models has become increasingly important for preventing undesirable content generation. Among prior approaches, sparse autoencoder (SAE)-based methods have attracted attention due to their ability to suppress target concepts…
Class imbalance is a persistent challenge in visual recognition, particularly in safety-critical domains where collecting positive examples is expensive and rare events are inherently underrepresented. We propose a lightweight synthetic data augmentation pipeline that fine-tunes …
Diffusion models typically generate image batches from independent Gaussian initial noises. We argue that this independence assumption is only one choice within a broader class of valid joint noise designs. Instead, one can specify a coupling of the initial noises: each noise rem…
Low-Rank Adaptation (LoRA) has become a widely used mechanism for customizing diffusion models, enabling users to inject new visual concepts or styles through lightweight parameter updates. However, LoRAs can memorize training images, causing generated outputs to reproduce copyri…
arXiv stat.ML
TIER_1English(EN)·Enrico Ventura, Beatrice Achilli, Luca Ambrogioni, Carlo Lucibello·
arXiv:2602.00716v4 Announce Type: replace Abstract: Classifier-free guidance (CFG) is the de facto standard for conditional sampling in diffusion models, yet it often reduces sample diversity. Using tools from statistical physics, we analyze the emergence of generative distortion…
arXiv stat.ML
TIER_1English(EN)·Dongqing Li, Geoff K. Nicholls, Shiyi Sun, You Luo·
arXiv:2605.06976v1 Announce Type: new Abstract: Many ranking and agent trace datasets are recorded as linear orders even though their latent structure is only partially ordered. This is especially common in agent and workflow traces, where observed order may reflect arbitrary lin…
arXiv:2605.07818v1 Announce Type: new Abstract: The expectation--maximization (EM) algorithm combines global monotonicity, local linear convergence, and strong practical robustness, but these features are usually analyzed separately. Global descent is nonlinear, whereas local con…
arXiv:2605.07625v1 Announce Type: cross Abstract: Denoising diffusion models have evolved into a state-of-the-art method for tasks in various fields, such as denoising and generation of images, text generation, or generation of synthetic data for training of other machine learnin…
arXiv stat.ML
TIER_1English(EN)·James Matthew Young, Paula Cordero-Encinar, Sebastian Reich, Andrew Duncan, O. Deniz Akyildiz·
arXiv:2601.21951v2 Announce Type: replace Abstract: We develop diffusion-based samplers for target distributions known up to a normalising constant. To this end, we rely on the well-known diffusion path that smoothly interpolates between a simple base distribution and the target,…
Sampling from score-based diffusion models incurs bias due to both time discretisation and the approximation of the score function. A common strategy for reducing this bias is to apply corrector steps based on the unadjusted Langevin algorithm (ULA) at each noise level within a p…
Text-to-image diffusion models can generate visually stunning images, yet, controlling what appears and how it appears, remains surprisingly difficult, especially when operating solely within the constraints of the text-conditioning space. For example, changing a subject or adjus…
Tokenizers are a crucial component of latent diffusion models, as they define the latent space in which diffusion models operate. However, existing tokenizers are primarily designed to improve reconstruction fidelity or inherit pretrained representations, leaving unclear what kin…
The expectation--maximization (EM) algorithm combines global monotonicity, local linear convergence, and strong practical robustness, but these features are usually analyzed separately. Global descent is nonlinear, whereas local convergence is governed by the spectrum of the line…
Recent video diffusion models (VDMs) synthesize visually convincing clips, yet still drop entities, mis-bind attributes, and weaken the interactions specified in the prompt. Representation-alignment objectives such as VideoREPA and MoAlign improve fine-grained text following by d…
Denoising diffusion models have evolved into a state-of-the-art method for tasks in various fields, such as denoising and generation of images, text generation, or generation of synthetic data for training of other machine learning models. First hitting diffusion models (FHDM) ar…
Many ranking and agent trace datasets are recorded as linear orders even though their latent structure is only partially ordered. This is especially common in agent and workflow traces, where observed order may reflect arbitrary linearization rather than true prerequisites. We in…
Large language models have achieved remarkable success under the autoregressive paradigm, yet high-quality text generation need not be tied to a fixed left-to-right order. Existing alternatives still struggle to jointly achieve generation efficiency, scalable representation learn…
Reinforcement learning fine-tuning has become the dominant approach for aligning diffusion models with human preferences. However, assessing images is intrinsically a multi-dimensional task, and multiple evaluation criteria need to be optimized simultaneously. Existing practice d…
Real-world datasets are inherently heterogeneous, yet how per-class structural differences and sampling imbalance shape the training dynamics of diffusion models-and potentially exacerbate disparities-remains poorly understood. While models typically transition from an initial ph…
Many normalizing flow architectures impose regularity constraints, yet their distributional approximation properties are not fully characterized. We study the expressivity of bi-Lipschitz normalizing flows through the lens of score-based diffusion models. For the probability flow…
arXiv cs.CV
TIER_1English(EN)·Bartlomiej Sobieski, Matthew Tivnan, Dawid P{\l}udowski, Micha{\l} Jan W{\l}odarczyk, Pengfei Jin, Przemyslaw Biecek, Quanzheng Li·
arXiv:2605.05026v1 Announce Type: new Abstract: Diffusion models are prone to generating structural hallucinations - samples that match the statistical properties of the training data yet defy underlying structural rules, resulting in anomalies like hands with more than five fing…
arXiv:2605.04412v1 Announce Type: new Abstract: 3D asset generation plays a pivotal role in fields such as gaming and virtual reality, enabling the rapid synthesis of high-fidelity 3D objects from a single or multiple images. Building on this capability, enabling style-controllab…
We study outlier tokens in Diffusion Transformers (DiTs) for image generation. Prior work has shown that Vision Transformers (ViTs) can produce a small number of high-norm tokens that attract disproportionate attention while carrying limited local information, but their role in g…
Recently, Deng et al. (2026) proposed Generative Modeling via Drifting (GMD), a novel framework for generative tasks. This note presents an analysis of GMD through the lens of Wasserstein Gradient Flows (WGF), i.e., the path of steepest descent for a functional in the space of pr…
Diffusion models are prone to generating structural hallucinations - samples that match the statistical properties of the training data yet defy underlying structural rules, resulting in anomalies like hands with more than five fingers. Recent research studied this failure mode f…
Hypergraphs model higher-order interactions, but realistic hypergraph generation remains difficult because incidence, hyperedge-size heterogeneity, and overlap structure are not faithfully captured by pairwise reductions. We propose \HEDGE, a generative model defined directly on …
arXiv:2605.03877v1 Announce Type: new Abstract: Dataset distillation enables efficient training by distilling the information of large-scale datasets into significantly smaller synthetic datasets. Diffusion based paradigms have emerged in recent years, offering novel perspectives…
arXiv:2605.03317v1 Announce Type: new Abstract: Representation alignment has recently emerged as an effective paradigm for accelerating Diffusion Transformer training. Despite their success, existing alignment methods typically impose a fixed supervision target or a fixed alignme…
arXiv:2605.01107v1 Announce Type: cross Abstract: Neural networks transform data through learned representations whose geometry affects separation, contraction, and generalization. Recent work studies this geometry using discrete curvature on neighborhood graphs, suggesting Ricci…
arXiv cs.CV
TIER_1English(EN)·Fangzheng Wu, Brian Summa·
arXiv:2605.01653v1 Announce Type: new Abstract: We introduce SteeringDiffusion, a bottlenecked activation-level control interface for diffusion models that exposes a smooth, monotonic, and runtime-adjustable control surface over the content--style trade-off. Our method keeps the …
arXiv:2605.00935v1 Announce Type: cross Abstract: Diffusion models have become the foundation of modern generative systems, with most research focusing primarily on improving generation efficiency and output quality. The timestep embedding component is a crucial part of the diffu…
arXiv cs.CV
TIER_1English(EN)·Xun Su, Hiroyuki Kasai·
arXiv:2510.23633v2 Announce Type: replace-cross Abstract: Pretrained diffusion models have demonstrated strong capabilities in zero-shot inverse problem solving by incorporating observation information into the generation process of the diffusion models. However, this presents an…
Representation alignment has recently emerged as an effective paradigm for accelerating Diffusion Transformer training. Despite their success, existing alignment methods typically impose a fixed supervision target or a fixed alignment granularity throughout the entire denoising t…
arXiv:2511.07756v4 Announce Type: replace Abstract: Diffusion models initialize generation from an isotropic Gaussian latent, yet changing only the random seed can substantially alter prompt faithfulness, composition, and visual quality. We explain this gap by distinguishing the …
arXiv cs.CV
TIER_1English(EN)·Saeed Mohseni-Sehdeh, Walid Saad, Kei Sakaguchi, Tao Yu·
arXiv:2507.18654v2 Announce Type: replace-cross Abstract: Diffusion models are powerful tools for sampling from high-dimensional distributions by progressively transforming pure noise into structured data through a denoising process. When equipped with a guidance mechanism, these…
arXiv cs.CV
TIER_1English(EN)·Anne Harrington, A. Sophia Koepke, Shyamgopal Karthik, Trevor Darrell, Alexei A. Efros·
arXiv:2601.00090v2 Announce Type: replace Abstract: Contemporary text-to-image models exhibit a surprising degree of mode collapse, as can be seen when sampling several images given the same text prompt. Previous work has attempted to address this issue by steering the model usin…
arXiv:2406.14429v3 Announce Type: replace-cross Abstract: In the landscape of generative artificial intelligence, diffusion-based models have emerged as a promising method for generating synthetic images. However, the application of diffusion models poses numerous challenges, par…
Neural networks transform data through learned representations whose geometry affects separation, contraction, and generalization. Recent work studies this geometry using discrete curvature on neighborhood graphs, suggesting Ricci-flow-like behavior across layers. We develop a sm…
arXiv stat.ML
TIER_1English(EN)·Michael S. Albergo·
We study the problem of training diffusion and flow generative models to sample from target distributions defined by an exponential tilting of a base density; a formulation that subsumes both sampling from unnormalized densities and reward fine-tuning of pre-trained models. This …
arXiv:2604.26365v1 Announce Type: new Abstract: To address the high sampling cost of Diffusion Transformers (DiTs), feature caching offers a training-free acceleration method. However, existing methods rely on hand-crafted forecasting formulas that fail under aggressive skipping.…
arXiv:2604.26503v1 Announce Type: new Abstract: Diffusion models have achieved remarkable success in synthesizing complex static and temporal visuals, a breakthrough largely driven by Classifier-Free Guidance (CFG). However, despite its pivotal role in aligning generated content …
arXiv cs.CV
TIER_1English(EN)·Miguel Espinosa, Eva Gmelich Meijling, Valerio Marsocci, Elliot J. Crowley, Mikolaj Czerkawski·
arXiv:2603.03239v2 Announce Type: replace Abstract: Earth observation applications increasingly rely on data from multiple sensors, including optical, radar, elevation, and land-cover. Relationships between modalities are fundamental for data integration but are inherently non-in…
arXiv cs.CV
TIER_1English(EN)·Yang Yang, Feifan Meng, Han Fang, Weiming Zhang·
arXiv:2604.26348v1 Announce Type: new Abstract: Diffusion models have achieved remarkable success in image generation, yet their training is predominantly driven by full-reference objectives that enforce pixel-wise similarity to ground-truth images.Such supervision, while effecti…
arXiv:2405.13729v3 Announce Type: replace-cross Abstract: In this paper, we study an under-explored but important factor of diffusion generative models, i.e., the combinatorial complexity. Data samples are generally high-dimensional, and for various structured generation tasks, a…
Diffusion models have achieved remarkable success in synthesizing complex static and temporal visuals, a breakthrough largely driven by Classifier-Free Guidance (CFG). However, despite its pivotal role in aligning generated content with textual prompts, standard CFG relies on a g…
To address the high sampling cost of Diffusion Transformers (DiTs), feature caching offers a training-free acceleration method. However, existing methods rely on hand-crafted forecasting formulas that fail under aggressive skipping. We propose L2P (Learnable Linear Predictor), a …
Diffusion models have achieved remarkable success in image generation, yet their training is predominantly driven by full-reference objectives that enforce pixel-wise similarity to ground-truth images.Such supervision, while effective for fidelity, may insufficient in terms of su…
arXiv:2604.25289v1 Announce Type: cross Abstract: Practically, training diffusion models typically requires explicit time conditioning to guide the network through the denoising sampling process. Especially in deterministic methods like DDIM, the absence of time conditioning lead…
arXiv cs.CV
TIER_1English(EN)·Nishit Anand, Manan Suri, Christopher Metzler, Dinesh Manocha, Ramani Duraiswami·
arXiv:2604.24877v1 Announce Type: new Abstract: Controlling illumination in images is essential for photography and visual content creation. While closed-source models have demonstrated impressive illumination control, open-source alternatives either require heavy control inputs …
Practically, training diffusion models typically requires explicit time conditioning to guide the network through the denoising sampling process. Especially in deterministic methods like DDIM, the absence of time conditioning leads to significant performance degradation. However,…
arXiv cs.CV
TIER_1English(EN)·Zhongjie Duan, Hong Zhang, Yingda Chen·
arXiv:2604.24351v1 Announce Type: cross Abstract: Controllable diffusion methods have substantially expanded the practical utility of diffusion models, but they are typically developed as isolated, backbone-specific systems with incompatible training pipelines, parameter formats,…
arXiv:2604.23536v1 Announce Type: new Abstract: Diffusion models have achieved unprecedented success in text-aligned generation, largely driven by Classifier-Free Guidance (CFG). However, standard CFG operates strictly on instantaneous gradients, omitting the intrinsic curvature …
arXiv:2602.04749v3 Announce Type: replace Abstract: Long-tailed class imbalance remains a fundamental obstacle in semantic segmentation of high-resolution remote-sensing imagery, where dominant classes shape learned representations and rare classes are systematically under-segmen…
arXiv stat.ML
TIER_1English(EN)·Fan Chen, Sinho Chewi, Constantinos Daskalakis, Alexander Rakhlin·
arXiv:2602.01338v2 Announce Type: replace-cross Abstract: We present algorithms for diffusion model sampling which obtain $\delta$-error in $\mathrm{polylog}(1/\delta)$ steps, given access to $\widetilde O(\delta)$-accurate score estimates in $L^2$. This is an exponential improve…
arXiv:2604.23552v1 Announce Type: cross Abstract: Diffusion models are central to modern generative modeling, and understanding how they balance memorization and generalization is critical for reliable deployment. Recent work has shown that memorization in diffusion models is sha…
Controlling illumination in images is essential for photography and visual content creation. While closed-source models have demonstrated impressive illumination control, open-source alternatives either require heavy control inputs like depth maps or do not release their data and…
Controllable diffusion methods have substantially expanded the practical utility of diffusion models, but they are typically developed as isolated, backbone-specific systems with incompatible training pipelines, parameter formats, and runtime hooks. This fragmentation makes it di…
arXiv cs.CV
TIER_1English(EN)·Mingxing Rao, Bowen Qu, Daniel Moyer·
arXiv:2509.25003v2 Announce Type: replace-cross Abstract: Membership inference attacks (MIAs) against Diffusion Models (DMs) raise pressing privacy concerns by revealing whether a sample was part of the training set. While existing methods typically rely on measuring reconstructi…
arXiv:2604.22379v1 Announce Type: new Abstract: Recent advances in distilling expensive diffusion models into efficient few-step generators show significant promise. However, these methods typically demand substantial computational resources and extended training periods, limitin…
Diffusion models are central to modern generative modeling, and understanding how they balance memorization and generalization is critical for reliable deployment. Recent work has shown that memorization in diffusion models is shaped by training dynamics, with generalization and …
Recent advances in distilling expensive diffusion models into efficient few-step generators show significant promise. However, these methods typically demand substantial computational resources and extended training periods, limiting accessibility for resource-constrained researc…
Diffusion-based generative models have reformed generative AI, and have enabled new capabilities in the science domain, for example, generating 3D structures of molecules. Due to the intrinsic problem structure of certain tasks, there is often a symmetry in the system, which iden…
Selecting an appropriate kernel is a central challenge in kernel-based spectral methods. In \emph{Kernelized Diffusion Maps} (KDM), the kernel determines the accuracy of the RKHS estimator of a diffusion-type operator and hence the quality and stability of the recovered eigenfunc…
Generative modeling within constrained sets is essential for scientific and engineering applications involving physical, geometric, or safety requirements (e.g., molecular generation, robotics). We present a unified framework for constrained diffusion models on generic nonconvex …
**Stability AI** launched **Stable Diffusion 3 Medium** with models ranging from **450M to 8B parameters**, featuring the MMDiT architecture and T5 text encoder for image text rendering. The community has shown mixed reactions following the departure of key researchers like Emad …
**Over 2500 new community members joined following Soumith Chintala's shoutout, highlighting growing interest in SOTA LLM-based summarization. The major highlight is the detailed paper release of **Stable Diffusion 3 (SD3)**, showcasing advanced text-in-image control and complex …
<p>The new stable diffusion model is everywhere! Of course you can use this model to quickly and easily create amazing, dream-like images to post on twitter, reddit, discord, etc., but this technology is also poised to be used in very pragmatic ways across industry. In this episo…
<p>Zyphra's latest release shows that an autoregressive MoE model can be converted into a discrete diffusion model with no systematic loss in evaluation performance. ZAYA1-8B-Diffusion-Preview achieves up to 7.7x inference speedup over autoregression by shifting decoding from mem…
<table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1u30dhr/diffusiongemma_under_real_workloads_feels_very/"> <img alt="DiffusionGemma under real workloads feels very different from benchmark demos" src="https://preview.redd.it/zrnom6hrwn6h1.jpeg?width=640&…
<!-- SC_OFF --><div class="md"><p>I’m building a pipeline to turn 4-side product photos into professional studio images. I’m currently using SAM 2 for segmentation and an Inpainting pipeline to generate the studio background, but the model keeps hallucinating or degrading the pro…
<!-- SC_OFF --><div class="md"><p>I’ve been switching between a few models lately but I still can’t find something that feels both fast and consistently clean in results. Some models look great but slow everything down, while others are fast but lose detail pretty quickly. Even w…