English(EN) A large-scale pipeline for LLM-assisted corpus annotation: variation and change in the English consider construction

LLM流水线以98%的准确率自动化语料库标注

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-16 04:00

研究人员开发了一种新颖的四阶段流水线，利用大型语言模型（LLM）自动标注大型自然语言语料库中的语法。该方法包括提示工程、事前评估、批量处理和事后验证，通过OpenAI API在标注美国历史英语语料库（Corpus of Historical American English）中的143,933行‘consider’同现行时，准确率超过98%。随后的分析揭示了先前未被记录的、特定体裁的评价性consider结构的变化，这表明LLM可以通过探索先前因实际限制而无法触及的问题，显著加速语料库语言学研究。 AI

影响实现了以往因手动标注瓶颈而无法进行的大规模语言学研究。

排序理由该集群描述了一篇详细介绍LLM辅助语料库标注新方法的论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Cameron Morin, Matti Marttinen Larsson · 2026-06-16 04:00

A large-scale pipeline for LLM-assisted corpus annotation: variation and change in the English consider construction

arXiv:2510.12306v3 Announce Type: replace Abstract: As natural language corpora expand at an unprecedented rate, manual annotation remains a significant methodological bottleneck in corpus linguistic work. We address this challenge by presenting a scalable pipeline for automating…

报道来源 [1]

A large-scale pipeline for LLM-assisted corpus annotation: variation and change in the English consider construction

相关实体

相关话题