English(EN) I trained a 75M parameter LLM from scratch on 18B tokens and it beats a model almost double its size

开发者训练出拥有7500万参数的大型语言模型，性能超越更大模型

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-02 17:41

一位开发者从头开始训练了一个名为KeyLM的7500万参数语言模型，并使用了180亿个token进行预训练。KeyLM的指令调优版本在IFEval基准测试中表现优于SmolLM-135M-Instruct，尽管其参数量和训练数据量都显著少于后者。虽然KeyLM在指令遵循方面表现出色，但在其他基准测试中，其表现与其规模相符，并且在知识类任务中存在频繁幻觉的问题。 AI

影响展示了针对特定任务高效训练小型模型的可能性，可能降低定制化大型语言模型开发的门槛。

排序理由一位独立开发者发布了一个自定义训练的模型及其基准测试结果。[lever_c_demoted from research: ic=1 ai=1.0]

在 r/LocalLLaMA 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/LocalLLaMA TIER_1 English(EN) · /u/cakes_and_candles · 2026-06-02 17:41

I trained a 75M parameter LLM from scratch on 18B tokens and it beats a model almost double its size

<div class="md"><p>I trained a small language model from scratch called KeyLM. It is 75M params, decoder-only, and there is a pretrained base, an instruction-tuned version, and a GGUF.</p> <p>On IFEval (instruction following) the 75M instruct model scores slightly …

报道来源 [1]

I trained a 75M parameter LLM from scratch on 18B tokens and it beats a model almost double its size

相关话题