English(EN) SciEval: A Benchmark for Automatic Evaluation of K-12 Science Instructional Materials

LLM幻觉与承诺失败相关，引入新的量化框架

作者 PulseAugur 编辑部 · [5 个来源] · 2026-04-28 10:23

一篇新论文提出，LLM幻觉并非源于知识缺乏，而是源于承诺失败，模型将概率质量分散到多个备选答案上，而不是集中于正确答案。这种现象随着模型规模的增大而增加，并且会因指令调优而加剧。另一篇论文介绍了GAMMA，一个用于混合精度量化的框架，该框架优化了LLM的比特分配，在内存限制下显著提高了准确性，并在Llama和Qwen模型上表现优于现有方法。此外，还开发了一个名为SciEval的基准，用于自动评估K-12科学教学材料，结果显示，当前主流LLM在没有领域特定微调的情况下，在此任务上表现不佳。 AI

影响新研究阐明了LLM幻觉的机制，并引入了模型优化和评估的新方法，有望提高其可靠性和效率。

排序理由该集群包含多篇详细介绍LLM行为和优化技术研究成果的学术论文。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 5 个来源。我们如何撰写摘要 →

报道来源 [5]

arXiv cs.CL TIER_1 English(EN) · Jewon Yeom, Jaewon Sok, Heejun Kim, Seonghyeon Park, Jeongjae Park, Taesup Kim · 2026-05-22 04:00

Hallucination as Commitment Failure: Larger LLMs Misfire Despite Knowing the Answer

arXiv:2605.22007v1 Announce Type: new Abstract: Hallucination is often viewed as a direct consequence of missing knowledge: a model answers incorrectly when the correct answer is absent from its generation-time distribution, and correctly when it is present. We test this assumpti…
arXiv cs.CL TIER_1 English(EN) · Taesup Kim · 2026-05-21 05:08

Hallucination as Commitment Failure: Larger LLMs Misfire Despite Knowing the Answer

Hallucination is often viewed as a direct consequence of missing knowledge: a model answers incorrectly when the correct answer is absent from its generation-time distribution, and correctly when it is present. We test this assumption by introducing a semantic notion of answer av…
arXiv cs.AI TIER_1 English(EN) · Xu Han · 2026-05-18 14:30

GAMMA: Global Bit Allocation for Mixed-Precision Models under Arbitrary Budgets

Mixed-precision quantization improves the budget--accuracy trade-off for large language models (LLMs) by allocating more bits to sensitive modules. However, automating this allocation at LLM scale faces a unique combination of constraints: learnable approaches require quantizatio…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-18 14:30

GAMMA: Global Bit Allocation for Mixed-Precision Models under Arbitrary Budgets

Mixed-precision quantization improves the budget--accuracy trade-off for large language models (LLMs) by allocating more bits to sensitive modules. However, automating this allocation at LLM scale faces a unique combination of constraints: learnable approaches require quantizatio…
arXiv cs.AI TIER_1 English(EN) · Jinjun Xiong · 2026-04-28 10:23

SciEval: A Benchmark for Automatic Evaluation of K-12 Science Instructional Materials

The need to evaluate instructional materials for K-12 science education has become increasingly important, as more educators use generative AI to create instructional materials. However, the review of instructional materials is time-consuming, expertise-intensive, and difficult t…

报道来源 [5]

Hallucination as Commitment Failure: Larger LLMs Misfire Despite Knowing the Answer

Hallucination as Commitment Failure: Larger LLMs Misfire Despite Knowing the Answer

GAMMA: Global Bit Allocation for Mixed-Precision Models under Arbitrary Budgets

GAMMA: Global Bit Allocation for Mixed-Precision Models under Arbitrary Budgets

SciEval: A Benchmark for Automatic Evaluation of K-12 Science Instructional Materials

相关实体

相关话题