PulseAugur
EN
LIVE 07:09:13

New algorithm efficiently finds most influential data sets

Researchers have developed a new algorithmic approach to efficiently identify the most influential sets of data points within a dataset. This method simplifies the computationally intensive task of searching through all possible subsets by reducing it to a sequence of top-k problems. The algorithm, based on Dinkelbach's method, offers a cost-effective solution for identifying these influential sets, which can significantly alter statistical estimations and model conclusions. AI

IMPACT Provides a more efficient method for identifying influential data points, potentially improving the robustness and interpretability of machine learning models.

RANK_REASON The cluster contains two arXiv papers on a statistical method for identifying influential data sets.

Read on arXiv stat.ML →

AI-generated summary · Google Gemini · from 4 sources. How we write summaries →

COVERAGE [4]

  1. arXiv stat.ML TIER_1 English(EN) · Lucas D. Konrad, Nikolas Kuschnig ·

    Finding Most Influential Sets

    arXiv:2606.05919v1 Announce Type: new Abstract: Identifying most influential sets (MIS) - size-$k$ subsets whose removal maximally changes a target estimand - is typically infeasible because it requires searching over $\binom{n}{k}$ subsets. For estimands with linear-fractional l…

  2. arXiv stat.ML TIER_1 English(EN) · Nikolas Kuschnig ·

    Finding Most Influential Sets

    Identifying most influential sets (MIS) - size-$k$ subsets whose removal maximally changes a target estimand - is typically infeasible because it requires searching over $\binom{n}{k}$ subsets. For estimands with linear-fractional leave-set-out effects, we show that MIS selection…

  3. arXiv stat.ML TIER_1 English(EN) · Nikolas Kuschnig ·

    Finding Most Influential Sets

    Identifying most influential sets (MIS) - size-$k$ subsets whose removal maximally changes a target estimand - is typically infeasible because it requires searching over $\binom{n}{k}$ subsets. For estimands with linear-fractional leave-set-out effects, we show that MIS selection…

  4. arXiv stat.ML TIER_1 English(EN) · Lucas D. Konrad, Nikolas Kuschnig ·

    Testing Most Influential Sets

    arXiv:2510.20372v4 Announce Type: replace Abstract: Small influential data subsets can dramatically impact model conclusions, with a few data points overturning key findings. While recent work identifies these most influential sets, there is no formal way to tell when maximum inf…