A comprehensive analysis of 13 modified versions of Google's Gemma 4 E2B model revealed that while all variants significantly improved safety by increasing the refusal rate, some also enhanced reasoning capabilities. Specifically, two variants, coder3101 and llmfan46, outperformed the base model on the GSM8K math benchmark. However, more aggressive modifications led to a notable decrease in language modeling performance and reasoning efficiency, with some variants showing significantly higher perplexity and empty responses. AI
IMPACT Demonstrates that model fine-tuning can improve specific capabilities like safety and reasoning, but aggressive methods risk degrading core performance.
RANK_REASON Analysis of multiple fine-tuned variants of an existing open-source model. [lever_c_demoted from research: ic=1 ai=1.0]
- coder3101
- Duoneural
- EtherOpus
- Gemma 4 E2B
- Huihui
- llmfan46
- treadon
- TrevorJS
- Wangzhang
- WWT CyberLab
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →