English(EN) What if specializing a model meant turning dials it already has, not adding new ones? A weight matrix amplifies its input along built-in directions, and each si

Transformer² 微调方法优化现有模型参数

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-29 21:47

一种名为 Transformer² 的新微调方法被提议用于 ICLR 2025，它建议通过调整现有参数而不是添加新参数来专门化 AI 模型。这种方法侧重于微调权重矩阵中的奇异值，这些奇异值代表了特定输入方向的增益。据报道，该方法是 Sakana AI 的 Fugu 模型背后的奇异值微调 (SVF) 技术，与 LoRA 相比，在参数少得多的情况下表现更优。 AI

影响这种方法可能导致更高效、参数量更少的模型专门化，从而可能降低微调的计算成本。

排序理由该集群描述了一种在即将举行的会议论文中提出的一种新的微调方法。[lever_c_demoted from research: ic=1 ai=1.0]

在 Mastodon — mastodon.social 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

Mastodon — mastodon.social TIER_1 English(EN) · [email protected] · 2026-06-29 21:47

What if specializing a model meant turning dials it already has, not adding new ones? A weight matrix amplifies its input along built-in directions, and each si

What if specializing a model meant turning dials it already has, not adding new ones? A weight matrix amplifies its input along built-in directions, and each singular value is the gain on one: how strongly that direction comes through. Transformer² (ICLR 2025) fine-tunes only tho…

链接 benjaminhan.net/…/20260629-transformer-sq…

报道来源 [1]

What if specializing a model meant turning dials it already has, not adding new ones? A weight matrix amplifies its input along built-in directions, and each si

相关实体

相关话题