Researchers have developed RoTRAG, a novel framework designed to enhance the detection of harmful content in multi-turn dialogues. This system augments retrieval-augmented generation by incorporating human-written moral norms, termed Rules of Thumb (RoTs), to provide explicit normative evidence for reasoning. RoTRAG also features a lightweight classifier to efficiently determine when retrieval-grounded reasoning is necessary, thereby reducing redundant computations. Experiments on benchmark datasets demonstrate significant improvements in harm classification and severity estimation compared to existing methods. AI
IMPACT This framework could lead to more reliable and interpretable AI systems for content moderation and safety.
RANK_REASON The cluster describes a research paper published on arXiv detailing a new AI framework. [lever_c_demoted from research: ic=1 ai=1.0]
- arXiv
- Hugging Face
- ProsocialDialog
- RoTRAG
- Rules of Thumb: An Investigation Into The Potential Of Contextual Transposition In Social Design
- Safety Reasoning Multi Turn Dialogue
- Wonduk Seo
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →