Grounded but Misleading: Evaluating Semantic Alignment in AI-Generated Security Explanations
Researchers have developed a new testbed called VEXA to evaluate AI-generated security explanations, specifically focusing on scam detection. The study found that explanations can appear grounded in evidence while semantically weakening or misdirecting the perceived risk. Even when explanations were less helpful or provided weaker reasoning, they still scored relatively high on perceived evidence grounding, highlighting a "grounding illusion" effect in AI security explanations. AI
IMPACT Highlights the need for advanced evaluation metrics beyond simple evidence citation for trustworthy AI security tools.