From Scoring to Explanations: Evaluating SHAP and LLM Rationales for Rubric-based Teaching Quality Assessment
Researchers have developed a new framework to interpret how automated scoring models assign quality ratings to complex language performances, such as classroom transcripts. This framework combines model-agnostic Shapley-value attributions with explanations generated by large language models (LLMs). In tests on the CLASS framework's Quality of Feedback dimension, Shapley values proved more reliable and transferable than LLM-generated rationales for explaining model predictions. AI
IMPACT Provides a more robust method for evaluating the faithfulness and transferability of explanations from AI models in educational assessment.