New framework Macro enhances multilingual LLM explanations

By PulseAugur Editorial · [1 sources] · 2026-05-12 06:56

Researchers have developed a new framework called Macro to improve the generation of counterfactual explanations for large language models across multiple languages. This preference alignment framework uses Direct Preference Optimization (DPO) to balance the trade-off between explanation validity and minimality, which has been a challenge for non-English languages. Experiments across seven languages demonstrated that Macro significantly enhances the validity of explanations without sacrificing minimality, outperforming both chain-of-thought and supervised fine-tuning baselines. AI

IMPACT Enhances the interpretability and trustworthiness of LLMs in multilingual contexts, potentially improving user trust and debugging capabilities.

RANK_REASON The cluster contains an academic paper detailing a new method for improving LLM explanations. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.CL TIER_1 English(EN) · Simon Ostermann · 2026-05-12 06:56

Enhancing Multilingual Counterfactual Generation through Alignment-as-Preference Optimization

Self-generated counterfactual explanations (SCEs) are minimally modified inputs (minimality) generated by large language models (LLMs) that flip their own predictions (validity), offering a causally grounded approach to unraveling black-box LLM behavior. Yet extending them beyond…

COVERAGE [1]

Enhancing Multilingual Counterfactual Generation through Alignment-as-Preference Optimization

RELATED ENTITIES

RELATED TOPICS