PulseAugur
EN
LIVE 19:52:47

More data can hurt ML models, study finds

A machine learning experiment demonstrated that adding more features to a model does not always improve performance and can even be detrimental. Researchers found that for landslide detection using satellite data, increasing the number of input channels from 14 to 30 resulted in only a negligible F1 score improvement of 0.2%. This phenomenon, related to the Hughes Phenomenon, occurs when features are highly correlated, providing redundant information and forcing the model to spread its learning capacity without a proportional increase in useful signal. AI

IMPACT Highlights the importance of careful feature selection over simply increasing data volume for optimizing ML model performance.

RANK_REASON The cluster is based on a peer-reviewed preprint discussing a machine learning experiment and its findings. [lever_c_demoted from research: ic=1 ai=1.0]

Read on Towards AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

More data can hurt ML models, study finds

COVERAGE [1]

  1. Towards AI TIER_1 English(EN) · Arslaan ·

    The Feature Selection Trap: Why ‘More Data’ Can Actively Hurt Your Machine Learning Model

    <h3>The Feature Selection Trap: Why ‘More Data’ Can Actively Hurt Your ML Model</h3><blockquote>DISCLAIMER: This article is based on a peer-reviewed preprint co-authored with researchers at Cardiff University, currently under review at Frontiers in Remote Sensing: <a href="https:…