The Estonian Language Institute, in collaboration with Propastop, has developed a new benchmark to evaluate large language models' resistance to Russian propaganda. The test involved posing questions in English, Estonian, and Russian, designed to elicit misinformation or propaganda narratives. Anthropic's Claude models, particularly Opus 4.7, demonstrated the strongest performance among proprietary frontier models, achieving an exemplary score on 77% of the test questions. AI
IMPACT This benchmark highlights the potential for LLMs to be influenced by state-sponsored propaganda, emphasizing the need for robust safety measures and further research into model alignment.
RANK_REASON The cluster describes a new benchmark and evaluation of LLMs, which falls under research. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →