A new benchmark, SomaliBench v0, has been developed to evaluate the safety refusal capabilities of open-weight language models in Somali, a low-resource language. The study found significant gaps in refusal rates between English and Somali for models like Llama-3.1-8B-Instruct, Aya-23-8B, Qwen-2.5-7B-Instruct, and Gemma-2-9B-Instruct. For many models, non-refusal in Somali often resulted in unclear or incoherent outputs rather than direct harmful compliance. AI
IMPACT Highlights the need for more robust safety evaluations in low-resource languages, potentially influencing future model development and testing.
RANK_REASON The cluster describes a new academic benchmark and evaluation of existing models, fitting the research bucket.
Read on Hugging Face Daily Papers →
- Aya-23-8B
- Claude Sonnet
- Gemma-2-9B-Instruct
- Khalid Yusuf Dahir Mr
- Llama-3.1-8B-Instruct
- Qwen-2.5-7B-Instruct
- SomaliBench
AI-generated summary · Google Gemini · from 3 sources. How we write summaries →