Co-Scraper: query-aware DOM Pruning and Reusable Scraper Synthesis for Lightweight Web Data Extraction
Researchers have developed Co-Scraper, a novel two-stage framework for efficient web data extraction. This system utilizes a fine-tuned Qwen3 8B model to integrate query-aware DOM pruning with stable extraction strategy induction. Co-Scraper demonstrates state-of-the-art performance on the SWDE dataset, achieving a 94.78% F1 score and a 90.39% reuse success rate, significantly improving the accuracy and resilience of web data acquisition. AI
IMPACT Enhances accuracy and resilience in web data acquisition tasks through advanced AI techniques.