Developer trains 75M parameter LLM that outperforms larger models

By PulseAugur Editorial · [1 sources] · 2026-06-02 17:41

A developer has trained a 75 million parameter language model called KeyLM from scratch, utilizing 18 billion tokens for pre-training. The instruction-tuned version of KeyLM demonstrates superior performance on the IFEval benchmark compared to SmolLM-135M-Instruct, despite having significantly fewer parameters and less training data. While KeyLM excels in instruction following, it performs as expected for its size on other benchmarks and is noted to hallucinate frequently on knowledge-based tasks. AI

IMPACT Demonstrates efficient training of smaller models for specific tasks, potentially lowering the barrier for custom LLM development.

RANK_REASON An individual developer released a custom-trained model with benchmark results. [lever_c_demoted from research: ic=1 ai=1.0]

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

r/LocalLLaMA TIER_1 English(EN) · /u/cakes_and_candles · 2026-06-02 17:41

I trained a 75M parameter LLM from scratch on 18B tokens and it beats a model almost double its size

<div class="md"><p>I trained a small language model from scratch called KeyLM. It is 75M params, decoder-only, and there is a pretrained base, an instruction-tuned version, and a GGUF.</p> <p>On IFEval (instruction following) the 75M instruct model scores slightly …

COVERAGE [1]

I trained a 75M parameter LLM from scratch on 18B tokens and it beats a model almost double its size

RELATED TOPICS