Brief · PulseAugur

TOOL · arXiv cs.CL English(EN) · 9h

Same Payload, Different Channel: Measuring Trust Asymmetry in Tool-Using Language Models

Researchers have developed a new metric, the Safety Asymmetry Score (SAS), to evaluate how language models' vulnerability to adversarial attacks changes based on the delivery channel of the malicious content. Their study, which tested six production LLMs, found that models designed for agentic roles are more susceptible to attacks embedded in tool descriptions than in user messages. This vulnerability shifts when the content appears in tool outputs, indicating that models may implicitly trust tool metadata more than user input. AI

IMPACT Highlights a critical safety blind spot in current tool-using LLMs, potentially impacting the security of AI agents.

Llama 3.3 70B
LLMs
Mohammed Sameer Syed
Safety Asymmetry Score