PulseAugur
实时 15:58:39
English(EN) Zero-Shot Confidence Estimation for Small LLMs: When Supervised Baselines Aren't Worth Training

LLM 现在可以口头表达置信度分数,表现优于监督方法

一篇新的研究论文探讨了小型语言模型的零样本置信度估计,证明简单的方法可以优于监督基线。研究发现,不需要训练数据的平均 token 对数概率,在评估模型正确性方面可以媲美甚至超过监督方法。这种方法对于节省成本的策略至关重要,例如本地到云路由,其中廉价的本地模型处理大多数查询,而昂贵的云调用则保留给困难的案例。 AI

影响 这项研究可以通过提高小型语言模型的自我评估能力,减少对昂贵云资源的依赖,从而实现更高效的部署。

排序理由 该集群包含一篇详细介绍评估小型语言模型新方法的学术论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

LLM 现在可以口头表达置信度分数,表现优于监督方法

报道来源 [3]

  1. arXiv cs.CL TIER_1 English(EN) · Daniel Yang, Yao-Hung Hubert Tsai, Makoto Yamada ·

    On Verbalized Confidence Scores for LLMs

    arXiv:2412.14737v2 Announce Type: replace Abstract: The rise of large language models (LLMs) and their tight integration into our daily life make it essential to dedicate efforts towards their trustworthiness. Uncertainty quantification for LLMs can establish more human trust int…

  2. arXiv cs.CL TIER_1 English(EN) · Luong N. Nguyen ·

    Zero-Shot Confidence Estimation for Small LLMs: When Supervised Baselines Aren't Worth Training

    arXiv:2605.02241v1 Announce Type: cross Abstract: How reliably can a small language model estimate its own correctness? The answer determines whether local-to-cloud routing-escalating queries a cheap local model cannot handle-can work without supervised training data. As inferenc…

  3. arXiv cs.CL TIER_1 English(EN) · Luong N. Nguyen ·

    Zero-Shot Confidence Estimation for Small LLMs: When Supervised Baselines Aren't Worth Training

    How reliably can a small language model estimate its own correctness? The answer determines whether local-to-cloud routing-escalating queries a cheap local model cannot handle-can work without supervised training data. As inference costs dominate large language model (LLM) deploy…