A Systematic Comparison between Extractive Self-Explanations and Human Rationales in Text Classification
A new research paper systematically compares self-generated explanations from instruction-tuned LLMs with human-provided rationales in text classification tasks. The study evaluates the plausibility and faithfulness of these self-explanations across sentiment classification, forced labor detection, and claim verification. Findings indicate that the alignment between LLM self-explanations and human rationales varies with text length and task complexity, though LLMs do produce faithful token-level rationales. AI
IMPACT This research provides insights into the quality and faithfulness of LLM-generated explanations, which is crucial for improving model interpretability and user trust.