Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 8h

RoTRAG: Rule of Thumb Reasoning for Conversation Harm Detection with Retrieval-Augmented Generation

Researchers have developed RoTRAG, a novel framework designed to enhance the detection of harmful content in multi-turn dialogues. This system augments retrieval-augmented generation by incorporating human-written moral norms, termed Rules of Thumb (RoTs), to provide explicit normative evidence for reasoning. RoTRAG also features a lightweight classifier to efficiently determine when retrieval-grounded reasoning is necessary, thereby reducing redundant computations. Experiments on benchmark datasets demonstrate significant improvements in harm classification and severity estimation compared to existing methods. AI

IMPACT This framework could lead to more reliable and interpretable AI systems for content moderation and safety.

Hugging Face
arXiv
Wonduk Seo
RoTRAG
Rules of Thumb: An Investigation Into The Potential Of Contextual Transposition In Social Design
ProsocialDialog
Safety Reasoning Multi Turn Dialogue