RoTRAG: Rule of Thumb Reasoning for Conversation Harm Detection with Retrieval-Augmented Generation
Researchers have developed RoTRAG, a novel framework designed to enhance the detection of harmful content in multi-turn dialogues. This system augments retrieval-augmented generation by incorporating human-written moral norms, termed Rules of Thumb (RoTs), to provide explicit normative evidence for reasoning. RoTRAG also features a lightweight classifier to efficiently determine when retrieval-grounded reasoning is necessary, thereby reducing redundant computations. Experiments on benchmark datasets demonstrate significant improvements in harm classification and severity estimation compared to existing methods. AI
IMPACT This framework could lead to more reliable and interpretable AI systems for content moderation and safety.