PulseAugur
EN
LIVE 07:57:27

New hypothesis offers efficient data influence estimation for large models

Researchers have introduced the Mirrored Influence Hypothesis, which suggests that understanding training data's influence on model predictions can be inverted to assess how training on test data would alter predictions for training samples. This new approach, which involves calculating gradients for test samples and a forward pass for training points, offers significant efficiency gains over existing methods, especially when test datasets are much smaller than training datasets. The method has demonstrated applicability in areas such as data attribution for diffusion models, detecting data leakage and mislabeled data, and analyzing memorization and behavior in language models. AI

IMPACT Provides a more efficient method for understanding data influence, potentially improving model trustworthiness and aiding in tasks like data leakage detection.

RANK_REASON This is a research paper detailing a new hypothesis and method for influence estimation in machine learning models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv stat.ML →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. arXiv stat.ML TIER_1 English(EN) · Myeongseob Ko, Feiyang Kang, Weiyan Shi, Ming Jin, Zhou Yu, Ruoxi Jia ·

    The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes

    arXiv:2402.08922v3 Announce Type: replace-cross Abstract: Large-scale black-box models have become ubiquitous across numerous applications. Understanding the influence of individual training data sources on predictions made by these models is crucial for improving their trustwort…