新的GEM框架采用几何方法增强LLM数据策展

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-27 04:00

研究人员推出了一种新颖的框架GEM（Geometric Entropy Mixing），用于优化大型语言模型（LLM）的数据策展。GEM将数据混合重新表述为超球面上的变分问题，采用混合平衡正则化器来克服现有分类方法（如人类分类法和欧几里得聚类）的局限性。该框架利用可证明的最小化最大化算法来发现平衡的语义结构，并在与现有混合策略集成时，在平均下游准确性方面展示了高达1.2%的改进。 AI

影响这种新的数据策展几何方法可能导致更高效、更有效的LLM训练，从而可能提高模型在下游任务上的性能。

排序理由该集群包含一篇详细介绍LLM数据策展新框架的研究论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Yue Min, Ziyun Qiao, Ruining Chen, Yujun Li · 2026-05-27 04:00

GEM：几何熵混合用于最优LLM数据策展

arXiv:2605.26121v1 Announce Type: cross Abstract: LLM pre-training efficacy increasingly depends on data composition rather than sheer volume. Yet, optimal mixing is hindered by categorization flaws: human taxonomies suffer from ontological misalignment, and Euclidean clustering …

报道来源 [1]

GEM：几何熵混合用于最优LLM数据策展

相关实体

相关话题