Researchers have identified a "narration gap" in Large Language Model (LLM)-solver loops, where the interaction between the LLM and formal solvers can compromise the soundness of the final answer presented to the user. While formal tools like SAT and SMT solvers offer verifiable outputs, the process of translating these outputs into a user-friendly narration is susceptible to manipulation. Experiments with five open-source models revealed that while techniques like certificate gating can ensure solver verdict soundness, adversaries can still exploit phrasing and channel variations to invert verified conclusions. Although hardened prompts reduce injection vulnerabilities, they are not entirely immune to adaptive attacks, indicating that robustness does not extend to the user-facing answer. AI
IMPACT Highlights a critical vulnerability in LLM reasoning pipelines that could undermine trust in AI-assisted decision-making.
RANK_REASON Academic paper detailing a novel vulnerability in LLM-solver interaction. [lever_c_demoted from research: ic=1 ai=1.0]
Read on Hugging Face Daily Papers →
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →