Lost in Interpretation: The Plausibility-Faithfulness Trade-off in Cross-Lingual Explanations
A new research paper explores the trade-offs in cross-lingual explanations for large language models. The study found that explanations generated in English for non-English inputs can be less faithful to the model's actual reasoning process, even if they appear plausible. This degradation in faithfulness, measured by comprehensiveness and sufficiency, can be significant, with comprehensiveness dropping up to 5.7 times compared to native-language explanations. The research suggests that auditing explanations in the input language and using multi-faceted faithfulness metrics are crucial for accurate model evaluation. AI
IMPACT Highlights potential inaccuracies in cross-lingual LLM auditing, emphasizing the need for native-language explanations and robust faithfulness metrics.