Machine Learning for Coding Retail Product Names to Consumer-Price Categories: A Rule-plus-Bag-of-Words Pipeline with Reliability-Weighted Human-in-the-Loop Labeling
A new research paper proposes a machine learning pipeline to categorize retail product names into consumer-price categories. The method involves text normalization, a rule-based classifier using key phrases, and a binary confirmation model. It also incorporates a human-in-the-loop labeling protocol with reliability weighting for continuous fine-tuning. AI
IMPACT Provides a scalable method for classifying unstructured product data, potentially improving price index accuracy and market analysis.