PulseAugur
实时 13:15:46
English(EN) Machine Learning for Coding Retail Product Names to Consumer-Price Categories: A Rule-plus-Bag-of-Words Pipeline with Reliability-Weighted Human-in-the-Loop Labeling

机器学习流水线将嘈杂的零售产品名称映射到价格类别

一篇新的研究论文提出了一种机器学习流水线,用于将零售产品名称分类到消费者价格类别中。该方法包括文本规范化、使用关键短语的基于规则的分类器以及二元确认模型。它还包含一个具有可靠性加权的人工干预标签协议,用于持续微调。 AI

影响 提供了一种可扩展的方法来对非结构化产品数据进行分类,有可能提高价格指数的准确性和市场分析。

排序理由 该集群包含一篇详细介绍新颖机器学习方法的学术论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

报道来源 [2]

  1. arXiv cs.CL TIER_1 English(EN) · Vladimir Beskorovainyi ·

    Machine Learning for Coding Retail Product Names to Consumer-Price Categories: A Rule-plus-Bag-of-Words Pipeline with Reliability-Weighted Human-in-the-Loop Labeling

    arXiv:2606.02004v1 Announce Type: new Abstract: Consumer-price measurement increasingly draws on alternative data sources -- scanner, web-scraped, and transaction/receipt data. A recurring obstacle is that product descriptions in such sources are short, noisy, and abbreviated, wi…

  2. arXiv cs.CL TIER_1 English(EN) · Vladimir Beskorovainyi ·

    Machine Learning for Coding Retail Product Names to Consumer-Price Categories: A Rule-plus-Bag-of-Words Pipeline with Reliability-Weighted Human-in-the-Loop Labeling

    Consumer-price measurement increasingly draws on alternative data sources -- scanner, web-scraped, and transaction/receipt data. A recurring obstacle is that product descriptions in such sources are short, noisy, and abbreviated, with no standard product code, so each item must f…