New AI methods advance causal discovery for complex, noisy, and large-scale data

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 33 sources

Several recent arXiv papers introduce novel methods and benchmarks for causal discovery, a field focused on identifying cause-and-effect relationships from data. These advancements include techniques for handling noisy or incomplete data, integrating expert knowledge, and improving scalability for large datasets. New benchmarks and testing frameworks are also being developed to rigorously evaluate the robustness of existing causal discovery algorithms against various assumption violations, particularly in time-series data and natural language reasoning. AI

Summary written by gemini-2.5-flash-lite from 33 sources. How we write summaries →

IMPACT Advances in causal discovery methods could lead to more reliable AI systems capable of understanding and reasoning about cause-and-effect relationships, particularly in complex or noisy environments.

RANK_REASON Multiple arXiv papers published on May 7, 2026, detailing new methods and benchmarks for causal discovery.

Read on arXiv cs.CL →

paper
other

COVERAGE [33]

arXiv cs.LG TIER_1 · Francesco Locatello · 2026-05-13 14:25

Causal Learning with the Invariance Principle

Causal discovery, the problem of inferring the direction of causality, is generally ill-posed. We use the language of structural causal models (SCM) to show that assuming that the causal relations are acyclic and invariant across multiple environments (e.g., the way minimum wage …
arXiv cs.AI TIER_1 · Francesco Locatello · 2026-05-13 12:24

Towards a holistic understanding of Selection Bias for Causal Effect Identification

Selection bias is pervasive in observational studies. For example, large scale biobanks data can exhibit ``healthy volunteer bias'' when respondents are healthier and of higher socio-economic status than the population they are meant to represent. Recovering causal effects from s…
Hugging Face Daily Papers TIER_1 · 2026-05-11 14:36

A Recursive Decomposition Framework for Causal Structure Learning in the Presence of Latent Variables

Constraint-based causal discovery is widely used for learning causal structures, but heavy reliance on conditional independence (CI) testing makes it computationally expensive in high-dimensional settings. To mitigate this limitation, many divide-and-conquer frameworks have been …
Hugging Face Daily Papers TIER_1 · 2026-05-11 13:19

ConfoundingSHAP: Quantifying confounding strength in causal inference

In causal inference, confounders are variables that influence both treatment decisions and outcomes. However, unlike as in randomized clinical trials, the treatment assignment mechanism in observational studies is not known, and it is thus unclear which covariates act as confound…
arXiv cs.AI TIER_1 · Alex Markham · 2026-05-11 08:13

Coarsening Linear Non-Gaussian Causal Models with Cycles

Recent work on causal abstraction, in particular graphical approaches focusing on causal structure between clusters of variables, aims to summarize a high-dimensional causal structure in terms of a low-dimensional one. Existing methods for learning such summaries from data assume…
arXiv cs.LG TIER_1 · Chris J. Maddison · 2026-05-08 16:50

Bayesian Sensitivity of Causal Inference Estimators under Evidence-Based Priors

Causal inference, especially in observational studies, relies on untestable assumptions about the true data-generating process. Sensitivity analysis helps us determine how robust our conclusions are when we alter these underlying assumptions. Existing frameworks for sensitivity a…
arXiv cs.LG TIER_1 · Joseph D. Ramsey · 2026-05-08 04:00

Fourier Feature Methods for Nonlinear Causal Discovery: FFML Scoring and FFCI Testing in Mixed Data

arXiv:2605.05743v1 Announce Type: cross Abstract: Gaussian process marginal likelihood scores and kernel conditional independence tests are theoretically appealing for nonlinear causal discovery but computationally prohibitive at scale. We present two complementary RFF-based meth…
arXiv cs.LG TIER_1 · Shicheng Fan, Nour Elhendawy, Jianle Sun, Ke Fang, Kun Zhang, Yihang Wang, Lu Cheng · 2026-05-08 04:00

MOSAIC: Module Discovery via Sparse Additive Identifiable Causal Learning for Scientific Time Series

arXiv:2605.05524v1 Announce Type: new Abstract: Causal representation learning (CRL) seeks to recover latent variables with identifiability guarantees, typically up to permutation and component-wise reparameterization under appropriate assumptions. However, identifiability does n…
arXiv cs.LG TIER_1 · Sunmin Oh, Sang-Yun Oh, Gunwoong Park · 2026-05-08 04:00

Relaxed Sparsest-Permutation Formulation for Causal Discovery at Scale

arXiv:2605.05568v1 Announce Type: cross Abstract: Despite the growing availability of large datasets, causal structure learning remains computationally prohibitive at scale. We revisit sparsest-permutation learning for linear structural equation models and show that exact Cholesk…
arXiv cs.LG TIER_1 · Adrick Tench, Thomas Demeester · 2026-05-08 04:00

Dynamic Expert-Guided Model Averaging for Causal Discovery

arXiv:2601.16715v2 Announce Type: replace Abstract: Would-be practitioners of causal discovery face a dizzying array of algorithms without a clear best choice. This abundance of competitive methods makes ensembling a natural strategy for practical applications. At the same time, …
arXiv cs.LG TIER_1 · Marvin Sextro, Weronika K{\l}os, Gabriel Dernbach · 2026-05-08 04:00

MapPFN: Learning Causal Perturbation Maps in Context

arXiv:2601.21092v3 Announce Type: replace Abstract: Planning effective interventions in biological systems requires treatment-effect models that adapt to unseen biological contexts by identifying their specific underlying mechanisms. Yet single-cell perturbation datasets span onl…
arXiv cs.LG TIER_1 · Thomas S. Robinson, Ranjit Lall · 2026-05-07 04:00

PAIR-CI: Calibrated Conditional Independence Testing for Causal Discovery with Incomplete Data

arXiv:2605.04838v1 Announce Type: cross Abstract: The standard constraint-based paradigm for causal discovery with incomplete data -- impute first, test second -- is frequently miscalibrated: any consistent conditional independence (CI) test rejects a true null with probability a…
arXiv cs.LG TIER_1 · Geert Mesters, Alvaro Ribot, Anna Seigal, Piotr Zwiernik · 2026-05-07 04:00

Causal discovery under mean independence and linearity

arXiv:2605.04381v1 Announce Type: cross Abstract: Causal discovery methods such as LiNGAM identify causal structure from observational data by assuming mutually independent disturbances. This assumption is fragile: shared volatility, common scale effects, or other forms of depend…
arXiv cs.CL TIER_1 · Zhi Xu, Yun Fu · 2026-05-07 04:00

NoisyCausal: A Benchmark for Evaluating Causal Reasoning Under Structured Noise

arXiv:2605.04313v1 Announce Type: new Abstract: Causal reasoning in natural language requires identifying relevant variables, understanding their interactions, and reasoning about effects and interventions, often under noisy or ambiguous conditions. While large language models (L…
arXiv cs.LG TIER_1 · Bruno Petrungaro, Anthony C. Constantinou · 2026-05-07 04:00

Time series causal discovery with variable lags

arXiv:2605.04081v1 Announce Type: new Abstract: Causal Bayesian Networks (CBNs) are a powerful tool for reasoning under uncertainty about complex real-world problems. Such problems evolve over time, responding to external shocks as they occur. To support decision-making, CBNs req…
arXiv cs.LG TIER_1 · Gideon Stein, Niklas Penzel, Tristan Piater, Joachim Denzler · 2026-05-06 04:00

TCD-Arena: Assessing Robustness of Time Series Causal Discovery Methods Against Assumption Violations

arXiv:2605.03045v1 Announce Type: new Abstract: Causal Discovery (CD) is a powerful framework for scientific inquiry. Yet, its practical adoption is hindered by a reliance on strong, often unverifiable assumptions and a lack of robust performance assessment. To address these limi…
Hugging Face Daily Papers TIER_1 · 2026-05-06 01:16

Causal discovery under mean independence and linearity

Causal discovery methods such as LiNGAM identify causal structure from observational data by assuming mutually independent disturbances. This assumption is fragile: shared volatility, common scale effects, or other forms of dependence can cause the methods to recover the wrong ca…
arXiv cs.CL TIER_1 · Yun Fu · 2026-05-05 21:26

NoisyCausal: A Benchmark for Evaluating Causal Reasoning Under Structured Noise

Causal reasoning in natural language requires identifying relevant variables, understanding their interactions, and reasoning about effects and interventions, often under noisy or ambiguous conditions. While large language models (LLMs) exhibit strong general reasoning abilities,…
arXiv stat.ML TIER_1 · Jin Du, Li Chen, Xun Xian, An Luo, Fangqiao Tian, Ganghua Wang, Charles Doss, Xiaotong Shen, Jie Ding · 2026-05-13 04:00

Ice Cream Doesn't Cause Drowning: Benchmarking LLMs Against Statistical Pitfalls in Causal Inference

arXiv:2505.13770v3 Announce Type: replace-cross Abstract: Reliable causal inference is essential for making decisions in high-stakes areas like medicine, economics, and public policy. However, it remains unclear whether large language models (LLMs) can handle rigorous and trustwo…
arXiv stat.ML TIER_1 · Oliver J. Hines, Caleb H. Miles · 2026-05-13 04:00

Learning density ratios in causal inference using Bregman-Riesz regression

arXiv:2510.16127v2 Announce Type: replace Abstract: The ratio of two probability density functions is a fundamental quantity that appears in many areas of statistics and machine learning, including causal inference, reinforcement learning, covariate shift, outlier detection, inde…
arXiv stat.ML TIER_1 · Hao Zhang · 2026-05-11 14:36

A Recursive Decomposition Framework for Causal Structure Learning in the Presence of Latent Variables

Constraint-based causal discovery is widely used for learning causal structures, but heavy reliance on conditional independence (CI) testing makes it computationally expensive in high-dimensional settings. To mitigate this limitation, many divide-and-conquer frameworks have been …
arXiv stat.ML TIER_1 · Stefan Feuerriegel · 2026-05-11 13:59

Amortizing Causal Sensitivity Analysis via Prior Data-Fitted Networks

Causal sensitivity analysis aims to provide bounds for causal effect estimates in the presence of unobserved confounding. However, existing methods for causal sensitivity analysis are per-instance procedures, meaning that changes to the dataset, causal query, sensitivity level, o…
arXiv stat.ML TIER_1 (CA) · Tobias Maringgele, Jalal Etesami · 2026-05-11 04:00

Optimal Experiments for Partial Causal Effect Identification

arXiv:2605.06993v1 Announce Type: cross Abstract: Causal queries are often only partially identifiable from observational data, and experiments that could tighten the resulting bounds are typically costly. We study the problem of selecting, prior to observing experimental outcome…
arXiv stat.ML TIER_1 · Shakeel Gavioli-Akilagun, Kieran Wood, Francesco Quinzan · 2026-05-08 04:00

Detecting Changes in Causal Dependence with Kernels and Copulas

arXiv:2605.05809v1 Announce Type: cross Abstract: We propose a framework for determining whether the causal dependence of an outcome $Y$ on a covariate $X$ changes at a given time point, given confounders $\boldsymbol{Z}$. For instance, in financial markets, the effect of a marke…
arXiv stat.ML TIER_1 (CA) · Jalal Etesami · 2026-05-07 22:16

Optimal Experiments for Partial Causal Effect Identification

Causal queries are often only partially identifiable from observational data, and experiments that could tighten the resulting bounds are typically costly. We study the problem of selecting, prior to observing experimental outcomes, a cost-constrained subset of experiments that m…
arXiv stat.ML TIER_1 · Francesco Quinzan · 2026-05-07 07:46

Detecting Changes in Causal Dependence with Kernels and Copulas

We propose a framework for determining whether the causal dependence of an outcome $Y$ on a covariate $X$ changes at a given time point, given confounders $\boldsymbol{Z}$. For instance, in financial markets, the effect of a market indicator on asset returns may causally change o…
arXiv stat.ML TIER_1 · Joseph D. Ramsey · 2026-05-07 06:34

Fourier Feature Methods for Nonlinear Causal Discovery: FFML Scoring and FFCI Testing in Mixed Data

Gaussian process marginal likelihood scores and kernel conditional independence tests are theoretically appealing for nonlinear causal discovery but computationally prohibitive at scale. We present two complementary RFF-based methods forming a practical toolkit for score-based, c…
arXiv stat.ML TIER_1 · Gunwoong Park · 2026-05-07 01:20

Relaxed Sparsest-Permutation Formulation for Causal Discovery at Scale

Despite the growing availability of large datasets, causal structure learning remains computationally prohibitive at scale. We revisit sparsest-permutation learning for linear structural equation models and show that exact Cholesky factorization is unnecessary for structure recov…
arXiv stat.ML TIER_1 · Ranjit Lall · 2026-05-06 12:34

PAIR-CI: Calibrated Conditional Independence Testing for Causal Discovery with Incomplete Data

The standard constraint-based paradigm for causal discovery with incomplete data -- impute first, test second -- is frequently miscalibrated: any consistent conditional independence (CI) test rejects a true null with probability approaching 1 when imputation error induces spuriou…
arXiv stat.ML TIER_1 · Piotr Zwiernik · 2026-05-06 01:16

Causal discovery under mean independence and linearity

Causal discovery methods such as LiNGAM identify causal structure from observational data by assuming mutually independent disturbances. This assumption is fragile: shared volatility, common scale effects, or other forms of dependence can cause the methods to recover the wrong ca…
arXiv stat.ML TIER_1 · Xihang Shan, Da Zhou · 2026-05-05 04:00

PRCD-MAP: Learning How Much to Trust Imperfect Priors in Causal Discovery

arXiv:2605.01669v1 Announce Type: new Abstract: External priors of unknown reliability create a brittle trade-off in causal discovery: blind trust amplifies errors, blind rejection wastes signal. Real priors are also \emph{heterogeneously} reliable -- physical laws are trustworth…
arXiv stat.ML TIER_1 · Da Zhou · 2026-05-03 01:48

PRCD-MAP: Learning How Much to Trust Imperfect Priors in Causal Discovery

External priors of unknown reliability create a brittle trade-off in causal discovery: blind trust amplifies errors, blind rejection wastes signal. Real priors are also \emph{heterogeneously} reliable -- physical laws are trustworthy, LLM-suggested edges are speculative -- yet ex…
Towards AI TIER_1 · Ruiz Rivera · 2026-05-07 21:31

Rethinking Predictors: Why Causal Reasoning Matters in Data Science (Part 1)

<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*nNbj5vzm6X-yFgapvo2g5g.jpeg" /><figcaption>Photo by <a href="https://unsplash.com/@theshubhamdhage?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Shubham Dhage</a> on <a href="https:/…

COVERAGE [33]

RELATED ENTITIES

RELATED TOPICS