Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 4d

A Systematic Comparison between Extractive Self-Explanations and Human Rationales in Text Classification

A new research paper systematically compares self-generated explanations from instruction-tuned LLMs with human-provided rationales in text classification tasks. The study evaluates the plausibility and faithfulness of these self-explanations across sentiment classification, forced labor detection, and claim verification. Findings indicate that the alignment between LLM self-explanations and human rationales varies with text length and task complexity, though LLMs do produce faithful token-level rationales. AI

IMPACT This research provides insights into the quality and faithfulness of LLM-generated explanations, which is crucial for improving model interpretability and user trust.

LLMs
Stephanie Brandl