I trained a 75M parameter LLM from scratch on 18B tokens and it beats a model almost double its size
A developer has trained a 75 million parameter language model called KeyLM from scratch, utilizing 18 billion tokens for pre-training. The instruction-tuned version of KeyLM demonstrates superior performance on the IFEval benchmark compared to SmolLM-135M-Instruct, despite having significantly fewer parameters and less training data. While KeyLM excels in instruction following, it performs as expected for its size on other benchmarks and is noted to hallucinate frequently on knowledge-based tasks. AI
IMPACT Demonstrates efficient training of smaller models for specific tasks, potentially lowering the barrier for custom LLM development.