Researchers have developed CAREBench, a new benchmark designed to evaluate child-safety risks in language models. Unlike previous evaluations that focused on explicit abuse material, CAREBench assesses upstream risks such as grooming, deception, privacy violations, and emotional dependency. The benchmark, which includes 500 prompts across twelve categories and was annotated by parents and clinicians, aims to help AI developers identify and address potential harms before they become overt. Initial evaluations of seven frontier models revealed failure rates ranging from 2% to 58%, highlighting significant gaps in current child safety protocols. AI
IMPACT This benchmark could drive improvements in AI safety by providing developers with a tool to identify and mitigate risks related to child exploitation and manipulation.
RANK_REASON The cluster contains an academic paper introducing a new benchmark for evaluating AI safety. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →