English(EN) How Surprising Is Historical Italian to Language Models? Tokenization Tax, Comprehension Tax, and a Simple Mitigation

大型语言模型难以处理历史意大利语，但上下文提示可提供缓解

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-25 16:52

一项新的研究论文提出了一个诊断框架，用于理解大型语言模型（LLMs）如何处理历史语言，将难度分解为分词成本、预测不确定性、语义鲁棒性和上下文敏感性。该研究在17世纪意大利语、19世纪意大利语和18世纪俄语文本上评估了该框架。研究结果表明，虽然历史文本会带来编码成本，但大型语言模型仍然可以表示历史含义，并且一个简单的时态上下文提示可以显著降低历史意外性。 AI

影响这项研究提供了一种更好地理解和潜在改进大型语言模型在历史文本上性能的方法，有助于数字图书馆工作流程。

排序理由这是一篇研究论文，详细介绍了一个用于评估大型语言模型在历史语言上性能的新诊断框架。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CL TIER_1 English(EN) · Maria Levchenko · 2026-06-26 04:00

How Surprising Is Historical Italian to Language Models? Tokenization Tax, Comprehension Tax, and a Simple Mitigation

arXiv:2606.27275v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly critical to digital library workflows, yet their ability to process historical language remains poorly understood. Historical difficulty is typically treated as a monolithic barrier, con…
arXiv cs.CL TIER_1 English(EN) · Maria Levchenko · 2026-06-25 16:52

How Surprising Is Historical Italian to Language Models? Tokenization Tax, Comprehension Tax, and a Simple Mitigation

Large language models (LLMs) are increasingly critical to digital library workflows, yet their ability to process historical language remains poorly understood. Historical difficulty is typically treated as a monolithic barrier, conflating orthographic variation, linguistic dista…

报道来源 [2]

How Surprising Is Historical Italian to Language Models? Tokenization Tax, Comprehension Tax, and a Simple Mitigation

How Surprising Is Historical Italian to Language Models? Tokenization Tax, Comprehension Tax, and a Simple Mitigation

相关实体

相关话题