English(EN) LLMs Corrupt Your Documents When You Delegate Philippe Laban, Tobias Schnabel, Jennifer Neville ( # Microsoft Research) Large Language Models ( # LLMs ) are poi

Microsoft Research：LLMs 在委托任务中损坏 25% 的文档

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-11 23:15

Microsoft Research 开发的新基准 DELEGATE-52 显示，当前的大型语言模型在委托工作流程中会显著损坏文档。即使是 Gemini 3.1 Pro、Claude 4.6 Opus 和 GPT 5.4 等先进模型，在扩展编辑任务中也会损坏约 25% 的文档内容。Agentic 工具进一步加剧了这一问题，增加了 6% 的损坏率，表明在各种专业领域中，AI 辅助文档编辑的信任度和可靠性普遍存在问题。 AI

影响当前的 LLMs 在委托任务中会在文档中引入重大错误，破坏了企业采纳的信任度和准备度。

排序理由该集群报告了一项新基准及其在文档编辑任务中 LLM 性能方面的发现。

在 Mastodon — mastodon.social 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

Mastodon — mastodon.social TIER_1 English(EN) · [email protected] · 2026-05-18 10:05

大型语言模型 (LLMs) 会在您委托时损坏您的文档 Philippe Laban, Tobias Schnabel, Jennifer Neville ( # Microsoft Research) 大型语言模型 ( # LLMs ) 是 poi

LLMs Corrupt Your Documents When You Delegate Philippe Laban, Tobias Schnabel, Jennifer Neville ( # Microsoft Research) Large Language Models ( # LLMs ) are poised to disrupt knowledge work, with the emergence of delegated work as a new interaction paradigm (e.g., vibe coding). D…

链接 arxiv.org/…/2604.15597
Mastodon — mastodon.social TIER_1 English(EN) · AIntelligenceHub · 2026-05-11 23:15

微软研究院一项名为 DELEGATE-52 的新基准测试发现，企业团队需要了解一件事：即使是最好的模型（Gemini 3.1 Pro、Claude 4.6 Opus、GPT

A new Microsoft Research benchmark called DELEGATE-52 found something enterprise teams need to know: even the best models (Gemini 3.1 Pro, Claude 4.6 Opus, GPT 5.4) corrupted 25% of document content over 20 interactions. Agentic tools added another 6% degradation. Only Python cod…

链接 aintelligencehub.com/…/ai-agents-corrupt-… aintelligencehub.com/link-not-found

报道来源 [2]

大型语言模型 (LLMs) 会在您委托时损坏您的文档 Philippe Laban, Tobias Schnabel, Jennifer Neville ( # Microsoft Research) 大型语言模型 ( # LLMs ) 是 poi

微软研究院一项名为 DELEGATE-52 的新基准测试发现，企业团队需要了解一件事：即使是最好的模型（Gemini 3.1 Pro、Claude 4.6 Opus、GPT

相关实体

相关话题