English(EN) Unstable Rankings in Bayesian Deep Learning Evaluation

研究发现贝叶斯深度学习评估在低数据设置下不稳定

作者 PulseAugur 编辑部 · [2 个来源] · 2026-04-28 04:00

两篇新的arXiv论文强调了贝叶斯深度学习方法的评估存在显著不稳定性，尤其是在数据稀缺的情况下。研究人员发现，标准的评估指标会产生不可靠且依赖于数据集的排名，这意味着一种方法的优越性会因具体数据集和样本量的大小而大相径庭。这些研究表明，当前的评估实践可能会误导实践者，并提出使用不确定性感知方法和报告方差轨迹来提供更稳健的模型性能评估。 AI

影响强调了当前贝叶斯深度学习评估方法可能存在的不可靠性，敦促实践者采用不确定性感知评估。

排序理由两篇发表在arXiv上的学术论文，讨论了评估贝叶斯深度学习模型的 the 论问题。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.LG TIER_1 English(EN) · Qishi Zhan, Minxuan Hu, Guansu Wang, Jiaxin Liu, Liang He · 2026-04-28 04:00

贝叶斯深度学习评估中的不稳定排名

arXiv:2604.23102v1 Announce Type: new Abstract: Standard evaluations of Bayesian deep learning methods assume that metric estimates are reliable, but we show this assumption fails under data scarcity. Method rankings are not only unreliable at small $n$, but also dataset-dependen…
arXiv cs.LG TIER_1 English(EN) · Qishi Zhan, Minxuan Hu, Liang He, Guansu Wang, Jiaxin Liu · 2026-04-28 04:00

两个方差的故事：当单种子基准在贝叶斯深度学习中失效时

arXiv:2604.23114v1 Announce Type: new Abstract: In limited-data settings, a single endpoint mean of an evaluation metric such as the Continuous Ranked Probability Score (CRPS) is itself a random variable, yet it is routinely reported as if it were a stable property of the method.…

报道来源 [2]

贝叶斯深度学习评估中的不稳定排名

两个方差的故事：当单种子基准在贝叶斯深度学习中失效时

相关实体

相关话题