A new research paper investigates how large language models process negation, finding that while models like Mistral-7B and Llama-3.1-8B have internal components capable of handling negation, their accuracy is often hampered by late-layer attention mechanisms that favor shortcuts. The study reveals that these models employ both attentional suppression and direct vector representation of negative phrases, with the latter proving more dominant. By analyzing these internal processes, the research aims to deepen the understanding of LLM internals and the interplay of competing mechanisms. AI
影响 Provides deeper insight into LLM internals, potentially guiding future model development for improved reasoning.
排序理由 This is a research paper published on arXiv detailing interpretability findings about LLMs.
AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →