Researchers have developed a new dataset called AIriskEval-edu-db2 to train and evaluate large language models (LLMs) for assessing pedagogical risks in educational content for K-12 students. The dataset includes over 1,600 explanations from science, language arts, and social sciences questions, featuring human-written explanations alongside LLM-generated ones simulating distinct pedagogical risks. It also incorporates structured annotations for risk localization and description, validated by expert teachers. Experiments show that fine-tuning a local Llama 3.1 8B model on this dataset allows it to approach the performance of stronger frontier models in risk detection and explainability assessment, while maintaining privacy. AI
IMPACT This dataset could improve the safety and reliability of AI-generated educational content for K-12 students.
RANK_REASON The cluster describes a new dataset for AI risk assessment in education, which falls under research. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →