EleutherAI researchers have updated their work on Mechanistic Anomaly Detection (MAD), finding that Llama 3.1 8B exhibits less "quirky" behavior on non-arithmetic tasks compared to Mistral 7B v0.1. Their MAD approaches, which previously showed strong performance, struggled with Llama 3.1, indicating that models can develop detectable anomalies that their current methods may miss. The team also explored alternative detection methods using normalising flows and sparse autoencoders, which performed similarly to existing techniques on raw activations. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
RANK_REASON Research update on anomaly detection methods applied to language models.