PulseAugur
实时 13:03:58
English(EN) Co-Scraper: query-aware DOM Pruning and Reusable Scraper Synthesis for Lightweight Web Data Extraction

Co-Scraper框架使用Qwen3 8B进行高级网页数据提取

研究人员开发了Co-Scraper,一个用于高效网页数据提取的新型两阶段框架。该系统利用微调后的Qwen3 8B模型,将查询感知式DOM修剪与稳定的提取策略归纳相结合。Co-Scraper在SWDE数据集上展现了最先进的性能,达到了94.78%的F1分数和90.39%的可复用成功率,显著提高了网页数据采集的准确性和弹性。 AI

影响 通过先进的AI技术,提高了网页数据采集任务的准确性和弹性。

排序理由 该集群描述了一篇发表在arXiv上的研究论文,详细介绍了一个用于网页数据提取的新框架。

在 arXiv cs.IR (Information Retrieval) 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

报道来源 [2]

  1. arXiv cs.AI TIER_1 English(EN) · Shoupeng Wang, Jiantao Qiu, Wuyang Zhang, Conghui He ·

    Co-Scraper: query-aware DOM Pruning and Reusable Scraper Synthesis for Lightweight Web Data Extraction

    arXiv:2606.14821v1 Announce Type: cross Abstract: The abundant and heterogeneous nature of web content necessitates automated information extraction, and generating scrapers that can be reused across similar web pages offers an effective solution for scalable data extraction. In …

  2. arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Conghui He ·

    Co-Scraper: query-aware DOM Pruning and Reusable Scraper Synthesis for Lightweight Web Data Extraction

    The abundant and heterogeneous nature of web content necessitates automated information extraction, and generating scrapers that can be reused across similar web pages offers an effective solution for scalable data extraction. In this work, we propose Co-Scraper, a two-stage fram…