New framework Macro enhances multilingual LLM explanations

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-12 06:56

Researchers have developed a new framework called Macro to improve the generation of counterfactual explanations for large language models across multiple languages. This preference alignment framework uses Direct Preference Optimization (DPO) to balance the trade-off between explanation validity and minimality, which has been a challenge for non-English languages. Experiments across seven languages demonstrated that Macro significantly enhances the validity of explanations without sacrificing minimality, outperforming both chain-of-thought and supervised fine-tuning baselines. AI

影响 Enhances the interpretability and trustworthiness of LLMs in multilingual contexts, potentially improving user trust and debugging capabilities.

排序理由 The cluster contains an academic paper detailing a new method for improving LLM explanations. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Simon Ostermann · 2026-05-12 06:56

Enhancing Multilingual Counterfactual Generation through Alignment-as-Preference Optimization

Self-generated counterfactual explanations (SCEs) are minimally modified inputs (minimality) generated by large language models (LLMs) that flip their own predictions (validity), offering a causally grounded approach to unraveling black-box LLM behavior. Yet extending them beyond…

报道来源 [1]

Enhancing Multilingual Counterfactual Generation through Alignment-as-Preference Optimization

相关实体

相关话题