PulseAugur
EN
LIVE 05:33:31

Influcoder offers scalable data attribution for LLMs

Researchers have developed Influcoder, a new method designed to efficiently attribute the influence of individual training data samples on large language models (LLMs). This approach addresses the scalability and speed limitations of existing influence function methods, making it practical for large datasets. Influcoder aims to help in curating high-quality datasets by identifying samples that might contribute to undesirable model behaviors, such as toxicity. AI

IMPACT Enables more efficient dataset curation and debugging for large language models.

RANK_REASON The cluster describes a new research paper detailing a novel method for data attribution in LLMs.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. arXiv cs.CL TIER_1 English(EN) · Dimitri Kachler, Damien Sileo, Pascal Denis ·

    Influcoder: Distilling Decoders' Gradient Influence Rankings into an Encoder for Data Attribution

    arXiv:2606.13668v1 Announce Type: new Abstract: With the growth of LLMs' (Large Language Models) capabilities, there has been an increasing push to curate high quality datasets by filtering samples in the training data. In general, Data Attribution (DA) methods aim to estimate ho…

  2. arXiv cs.CL TIER_1 English(EN) · Pascal Denis ·

    Influcoder: Distilling Decoders' Gradient Influence Rankings into an Encoder for Data Attribution

    With the growth of LLMs' (Large Language Models) capabilities, there has been an increasing push to curate high quality datasets by filtering samples in the training data. In general, Data Attribution (DA) methods aim to estimate how individual samples in a training dataset can p…