The Truth Stays in the Family: Enhancing Contextual Grounding via Inherited Truthful Heads in Model Lineages
A new research paper explores the preservation of contextual truthfulness across model lineages, finding that truth scores are strongly maintained from foundational large language models (LLMs) to their downstream variants, including instruction-tuned and multimodal adaptations. This inheritance is linked to the preservation of attention head weights. The study proposes a method called TruthProbe, which amplifies context-truthful heads to improve truthfulness and reduce hallucinations in models like Vicuña, Qwen2.5, LLaMA2, and Mistral. AI
IMPACT Suggests that foundational model truthfulness is a stable trait, potentially simplifying the development of more reliable downstream AI models.