A new research paper systematically compares self-generated explanations from instruction-tuned LLMs with human-provided rationales in text classification tasks. The study evaluates the plausibility and faithfulness of these self-explanations across sentiment classification, forced labor detection, and claim verification. Findings indicate that the alignment between LLM self-explanations and human rationales varies with text length and task complexity, though LLMs do produce faithful token-level rationales. AI
IMPACT This research provides insights into the quality and faithfulness of LLM-generated explanations, which is crucial for improving model interpretability and user trust.
RANK_REASON The cluster contains an academic paper detailing a systematic comparison of LLM-generated explanations with human rationales. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →