PulseAugur
实时 09:11:47

新的“Goggles”模块训练大型语言模型区分虚构与事实

研究人员开发了一种名为“Goggles”的新型模块,可以在语言模型微调过程中应用,以植入特定的认知框架,例如识别内容为虚构。该模块通过编辑模型接收到的梯度来实现,而不是改变训练数据本身。当训练来识别虚构内容时,配备 Goggles 的模型能够大约 91% 的时间正确识别虚构声明,相比基线 9% 的识别率有了显著提高,同时保持了其整体能力。Goggles 模块还可以针对其他框架进行训练,例如将文档视为 AI 安全评估的一部分,并且其植入的框架即使在持续微调下也能保持不变。 AI

影响 这项研究为在不吸收不良行为的情况下,对齐不良数据进行语言模型训练提供了一种潜在方法,提高了它们区分事实与虚构内容的能力。

排序理由 该集群包含一篇详细介绍语言模型训练新方法的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

新的“Goggles”模块训练大型语言模型区分虚构与事实

报道来源 [2]

  1. arXiv cs.AI TIER_1 English(EN) · Joshua Penman ·

    Epistemic Goggles: A Pretrained Module that Induces an Epistemic Frame via Gradient Editing

    arXiv:2607.01690v1 Announce Type: new Abstract: Finetuning a language model on documents that are explicitly annotated as fictional results in a model that still actually believes the documents' core claims, an effect known as Negation Neglect. In our evaluations, models trained …

  2. arXiv cs.CL TIER_1 English(EN) · Joshua Penman ·

    Epistemic Goggles: A Pretrained Module that Induces an Epistemic Frame via Gradient Editing

    Finetuning a language model on documents that are explicitly annotated as fictional results in a model that still actually believes the documents' core claims, an effect known as Negation Neglect. In our evaluations, models trained on documents prefixed and suffixed with such ann…