LLM Agents More Sensitive to Semantic Noise Than Formatting, Study Finds

By PulseAugur Editorial · [2 sources] · 2026-05-25 15:57

A new study investigates how Large Language Model (LLM) agents process different types of noise in their reasoning. Researchers found that meaning-altering perturbations, such as paraphrasing, have a greater impact on LLM agent answers than presentation-based changes like reformatting, even when the severity is matched. The study validated these findings on a held-out model and proposed a 'stealth-divergence' mechanism where semantic changes affect intermediate reasoning steps, leading to different outcomes. AI

IMPACT Highlights a key vulnerability in LLM agents, suggesting that subtle semantic changes can significantly derail reasoning processes.

RANK_REASON The cluster contains a research paper detailing empirical findings on LLM agent behavior.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

LLM Agents More Sensitive to Semantic Noise Than Formatting, Study Finds

COVERAGE [2]

arXiv cs.CL TIER_1 English(EN) · Liyun Zhang, Jiayi Guo · 2026-05-26 04:00

When Do LLM Agents Treat Surface Noise Differently from Semantic Noise? A 68-Cell Measurement Study with a Held-Out Trace-Level Validation

arXiv:2605.25981v1 Announce Type: new Abstract: We document an empirical phenomenon in chain-of-thought and ReAct agents driven by ten large language models from seven architecture families: meaning-bearing perturbations (e.g., paraphrase, synonym) alter final answers more often …
arXiv cs.CL TIER_1 English(EN) · Jiayi Guo · 2026-05-25 15:57

When Do LLM Agents Treat Surface Noise Differently from Semantic Noise? A 68-Cell Measurement Study with a Held-Out Trace-Level Validation

We document an empirical phenomenon in chain-of-thought and ReAct agents driven by ten large language models from seven architecture families: meaning-bearing perturbations (e.g., paraphrase, synonym) alter final answers more often than presentation perturbations (e.g., formattin…

COVERAGE [2]

When Do LLM Agents Treat Surface Noise Differently from Semantic Noise? A 68-Cell Measurement Study with a Held-Out Trace-Level Validation

When Do LLM Agents Treat Surface Noise Differently from Semantic Noise? A 68-Cell Measurement Study with a Held-Out Trace-Level Validation

RELATED ENTITIES

RELATED TOPICS