The Feature Selection Trap: Why ‘More Data’ Can Actively Hurt Your Machine Learning Model
A machine learning experiment demonstrated that adding more features to a model does not always improve performance and can even be detrimental. Researchers found that for landslide detection using satellite data, increasing the number of input channels from 14 to 30 resulted in only a negligible F1 score improvement of 0.2%. This phenomenon, related to the Hughes Phenomenon, occurs when features are highly correlated, providing redundant information and forcing the model to spread its learning capacity without a proportional increase in useful signal. AI
IMPACT Highlights the importance of careful feature selection over simply increasing data volume for optimizing ML model performance.