Researchers have developed Influcoder, a new method designed to efficiently attribute the influence of individual training data samples on large language models (LLMs). This approach addresses the scalability and speed limitations of existing influence function methods, making it practical for large datasets. Influcoder aims to help in curating high-quality datasets by identifying samples that might contribute to undesirable model behaviors, such as toxicity. AI
IMPACT Enables more efficient dataset curation and debugging for large language models.
RANK_REASON The cluster describes a new research paper detailing a novel method for data attribution in LLMs.
- arXiv
- Data attribution using frequent pattern analysis
- Influcoder
- Influence Functions
- large-language models
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →