English(EN) When Do LLM Agents Treat Surface Noise Differently from Semantic Noise? A 68-Cell Measurement Study with a Held-Out Trace-Level Validation

研究发现 LLM 代理对语义噪声比对格式噪声更敏感

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-25 15:57

一项新研究调查了大型语言模型 (LLM) 代理如何处理其推理中的不同类型的噪声。研究人员发现，与基于表示的更改（如重新格式化）相比，改变含义的扰动（如释义）对 LLM 代理的答案影响更大，即使在严重程度匹配的情况下也是如此。该研究在留出的模型上验证了这些发现，并提出了一个“隐匿性发散”机制，其中语义变化会影响中间推理步骤，从而导致不同的结果。 AI

影响突出了 LLM 代理的一个关键漏洞，表明细微的语义变化会严重破坏推理过程。

排序理由该集群包含一篇详细介绍 LLM 代理行为的实证研究论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CL TIER_1 English(EN) · Liyun Zhang, Jiayi Guo · 2026-05-26 04:00

When Do LLM Agents Treat Surface Noise Differently from Semantic Noise? A 68-Cell Measurement Study with a Held-Out Trace-Level Validation

arXiv:2605.25981v1 Announce Type: new Abstract: We document an empirical phenomenon in chain-of-thought and ReAct agents driven by ten large language models from seven architecture families: meaning-bearing perturbations (e.g., paraphrase, synonym) alter final answers more often …
arXiv cs.CL TIER_1 English(EN) · Jiayi Guo · 2026-05-25 15:57

When Do LLM Agents Treat Surface Noise Differently from Semantic Noise? A 68-Cell Measurement Study with a Held-Out Trace-Level Validation

We document an empirical phenomenon in chain-of-thought and ReAct agents driven by ten large language models from seven architecture families: meaning-bearing perturbations (e.g., paraphrase, synonym) alter final answers more often than presentation perturbations (e.g., formattin…

报道来源 [2]

When Do LLM Agents Treat Surface Noise Differently from Semantic Noise? A 68-Cell Measurement Study with a Held-Out Trace-Level Validation

When Do LLM Agents Treat Surface Noise Differently from Semantic Noise? A 68-Cell Measurement Study with a Held-Out Trace-Level Validation

相关实体

相关话题