PulseAugur
实时 23:25:42
English(EN) LLMs Corrupt Your Documents When You Delegate Philippe Laban, Tobias Schnabel, Jennifer Neville ( # Microsoft Research) Large Language Models ( # LLMs ) are poi

Microsoft Research:LLMs 在委托任务中损坏 25% 的文档

Microsoft Research 开发的新基准 DELEGATE-52 显示,当前的大型语言模型在委托工作流程中会显著损坏文档。即使是 Gemini 3.1 ProClaude 4.6 OpusGPT 5.4 等先进模型,在扩展编辑任务中也会损坏约 25% 的文档内容。Agentic 工具进一步加剧了这一问题,增加了 6% 的损坏率,表明在各种专业领域中,AI 辅助文档编辑的信任度和可靠性普遍存在问题。 AI

影响 当前的 LLMs 在委托任务中会在文档中引入重大错误,破坏了企业采纳的信任度和准备度。

排序理由 该集群报告了一项新基准及其在文档编辑任务中 LLM 性能方面的发现。

在 Mastodon — mastodon.social 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

Microsoft Research:LLMs 在委托任务中损坏 25% 的文档

报道来源 [2]

  1. Mastodon — mastodon.social TIER_1 English(EN) · [email protected] ·

    LLMs Corrupt Your Documents When You Delegate Philippe Laban, Tobias Schnabel, Jennifer Neville ( # Microsoft Research) Large Language Models ( # LLMs ) are poi

    LLMs Corrupt Your Documents When You Delegate Philippe Laban, Tobias Schnabel, Jennifer Neville ( # Microsoft Research) Large Language Models ( # LLMs ) are poised to disrupt knowledge work, with the emergence of delegated work as a new interaction paradigm (e.g., vibe coding). D…

  2. Mastodon — mastodon.social TIER_1 English(EN) · AIntelligenceHub ·

    A new Microsoft Research benchmark called DELEGATE-52 found something enterprise teams need to know: even the best models (Gemini 3.1 Pro, Claude 4.6 Opus, GPT

    A new Microsoft Research benchmark called DELEGATE-52 found something enterprise teams need to know: even the best models (Gemini 3.1 Pro, Claude 4.6 Opus, GPT 5.4) corrupted 25% of document content over 20 interactions. Agentic tools added another 6% degradation. Only Python cod…