Redact or Keep? A Fully Local AI Cascade for Educational Dialogue De-Identification
Researchers have developed a novel AI cascade framework designed to de-identify sensitive educational dialogue while preserving valuable content. This local system addresses the limitations of commercial LLMs, which require data sharing, and traditional NER systems that over-redact. The proposed method reframes de-identification as a privacy triage task, using a recall-first union proposer and a context-aware reviewer to make accurate Redact/Keep decisions. Evaluations show this local configuration achieves a 0.958 macro F1 score, outperforming both same-family LLM baselines and commercial APIs, and operates entirely on a single laptop. AI
IMPACT This research suggests that problem formulation can be more critical than model scale for specific AI tasks like de-identification.