PulseAugur
实时 15:44:48
English(EN) GSAR: Typed Grounding for Hallucination Detection and Recovery in Multi-Agent LLMs

新研究通过新颖方法和基准解决大语言模型幻觉问题

多篇在arXiv上发布的研究论文解决了大型语言模型和视觉-语言模型中的幻觉挑战。一篇论文介绍了上下文内视觉对比优化(IC-VCO),通过在共享上下文中利用对比图像和新颖的样本编辑策略来减轻多模态幻觉。另一项研究调查了影响幻觉鲁棒性的架构因素,对幻觉进行分类并为模型设计提供指导。此外,还提出了一个新的框架BenHalluEval,用于评估和检测孟加拉语模型中的幻觉,突显了现有方法在低资源语言上的不足。其他研究则将幻觉检测重新构建为分布外检测,并探讨了提示毒性如何影响事实可靠性。 AI

影响 这些研究为提高大语言模型的准确性和可靠性提供了新技术和基准,这对于它们在敏感应用中的安全部署至关重要。

排序理由 多篇在arXiv上发表的学术论文,提出了大语言模型幻觉的新方法和评估。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 153 个来源。 我们如何撰写摘要 →

新研究通过新颖方法和基准解决大语言模型幻觉问题

报道来源 [153]

  1. arXiv cs.CL TIER_1 English(EN) · Hao Yin, Guangzong Si, Zilei Wang ·

    性能提升的幻觉:对比解码为何未能缓解多模态大模型中的对象幻觉?

    arXiv:2504.10020v4 Announce Type: replace Abstract: Contrastive decoding strategies are widely used to reduce object hallucinations in multimodal large language models (MLLMs). These methods work by constructing contrastive samples to induce hallucinations and then suppressing th…

  2. arXiv cs.AI TIER_1 English(EN) · Saroj Mishra ·

    Agentic RAG 中的级联幻觉:用于检测和缓解的 CHARM 框架

    arXiv:2606.04435v1 Announce Type: new Abstract: Multi-step agentic retrieval-augmented generation (RAG) pipelines have demonstrated significant capability for complex reasoning tasks, yet remain vulnerable to a class of failure that existing hallucination detection mechanisms sys…

  3. arXiv cs.AI TIER_1 English(EN) · Bodla Krishna Vamshi, Rohan Bhatnagar, Haizhao Yang ·

    大型语言模型中的几何感知幻觉检测

    arXiv:2601.06196v3 Announce Type: replace-cross Abstract: Large language models (LLMs) frequently generate factually incorrect or unsupported content, commonly referred to as hallucinations. Prior work has explored decoding strategies, retrieval augmentation, and supervised fine-…

  4. arXiv cs.CL TIER_1 English(EN) · Litian Liu, Reza Pourreza, Sunny Panchal, Apratim Bhattacharyya, Yubing Jian, Yao Qin, Roland Memisevic ·

    通过注入噪声增强幻觉检测

    arXiv:2502.03799v4 Announce Type: replace Abstract: Large Language Models (LLMs) are prone to generating plausible yet incorrect responses, known as hallucinations. Effectively detecting hallucinations is therefore crucial for the safe deployment of LLMs. Recent research has link…

  5. arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Saroj Mishra ·

    Agentic RAG 中的级联幻觉:用于检测和缓解的 CHARM 框架

    Multi-step agentic retrieval-augmented generation (RAG) pipelines have demonstrated significant capability for complex reasoning tasks, yet remain vulnerable to a class of failure that existing hallucination detection mechanisms systematically miss: cascading hallucination, where…

  6. arXiv cs.AI TIER_1 English(EN) · Chenshuang Zhang, Kyeong Seon Kim, Chengxin Liu, Tae-Hyun Oh ·

    SVHalluc:评估视听大语言模型中的语音-视觉幻觉

    arXiv:2606.02642v1 Announce Type: cross Abstract: Despite the success of audio-visual large-language models (LLMs), they can produce plausible but ungrounded outputs, termed hallucination. Existing benchmarks focus on environmental sounds (e.g., dog barking) to indicate event occ…

  7. arXiv cs.AI TIER_1 English(EN) · Ruipeng Zhang, Zhihao Li, Haozhang Yuan, C. L. Philip Chen, Tong Zhang ·

    P\textsuperscript{2}-DPO:通过校准直接偏好优化在感知处理中消除幻觉

    arXiv:2606.03376v1 Announce Type: cross Abstract: Hallucination has recently garnered significant research attention in Large Vision-Language Models (LVLMs). Direct Preference Optimization (DPO) aims to learn directly from the corrected preferences provided by humans, thereby add…

  8. arXiv cs.AI TIER_1 English(EN) · Mingkuan Zhao, Wentao Hu, Tianchen Huang, Yuheng Min, Suquan Chen, Yide Gao, Yanbo Zhai, Shuangyong Song, Xuelong Li ·

    幻觉作为正交噪声:通过动态上下文正交化进行推理时流形对齐

    arXiv:2606.03022v1 Announce Type: cross Abstract: Hallucination in Large Language Models (LLMs), characterized by the generation of content inconsistent with contextual facts or logical constraints -- remains a persistent challenge for reliable deployment. In this work, we addres…

  9. arXiv cs.LG TIER_1 English(EN) · Mahdi Erfanian, Nelson Daniel Troncoso, Aashna Garg, Amabel Gale, Xiaoyu Liu, Pareesa Ameneh Golnari, Shengyu Fu ·

    合成幻觉,真实收益:来自前沿模型的硬负例用于FIM幻觉缓解

    arXiv:2606.03130v1 Announce Type: new Abstract: Small open-source code models that power IDE autocomplete still emit hallucinated Fill-in-the-Middle (FIM) completions: syntactically natural calls to methods, parameters, variables, and imports that do not exist in the surrounding …

  10. arXiv cs.CL TIER_1 English(EN) · Aizierjiang Aiersilan ·

    量化大模型中的幻觉可从中层隐藏状态线性解码

    arXiv:2606.02628v1 Announce Type: cross Abstract: We investigate whether open-source LLMs encode a linearly separable truthfulness signal in their hidden states, and at which network depth this signal is strongest. Across three $7$B--$8$B instruction-tuned models (Llama-3.1-8B, M…

  11. arXiv cs.AI TIER_1 English(EN) · Yuetian Lu, Yihong Liu, Sebastian Gerstner, Lea Hirlimann, Jonas Rohweder, Hinrich Sch\"utze ·

    关系线性是幻觉的预测因子

    arXiv:2601.11429v2 Announce Type: replace-cross Abstract: Hallucination is a central failure mode of language models (LMs). We focus on hallucinations in response to questions like: "Which instrument did Glenn Gould play?", but we ask these questions for synthetic entities design…

  12. arXiv cs.AI TIER_1 English(EN) · Lin Li, Georgia Channing, Suhaas M Bhat, Gabriel Davis Jones, Yarin Gal ·

    通过幻觉拒绝采样构建可靠的长文本生成

    arXiv:2606.03628v1 Announce Type: cross Abstract: Large language models (LLMs) have achieved remarkable progress in open-ended text generation, yet they remain prone to hallucinating incorrect or unsupported content, which undermines their reliability. This issue is exacerbated i…

  13. arXiv cs.AI TIER_1 English(EN) · Yarin Gal ·

    通过幻觉拒绝采样构建可靠的长文本生成

    Large language models (LLMs) have achieved remarkable progress in open-ended text generation, yet they remain prone to hallucinating incorrect or unsupported content, which undermines their reliability. This issue is exacerbated in long-form generation due to hallucination snowba…

  14. arXiv cs.CL TIER_1 English(EN) · Tong Zhang ·

    P\textsuperscript{2}-DPO:通过校准直接偏好优化在感知处理中消除幻觉

    Hallucination has recently garnered significant research attention in Large Vision-Language Models (LVLMs). Direct Preference Optimization (DPO) aims to learn directly from the corrected preferences provided by humans, thereby addressing the hallucination issue. Despite its succe…

  15. arXiv cs.AI TIER_1 English(EN) · Kaixiang Zhao, Tianrun Yu, Shawn Huang, Porter Jenkins, Yushun Dong, Amanda Hughes ·

    TIGER:基于图的证据路由的可追溯推理,用于减轻多模态生成中的幻觉

    arXiv:2606.00232v1 Announce Type: new Abstract: We study fact-level repair for multimodal generation, where a fluent output may contain specific facts that are not supported by the input. Existing inference-time repair methods often generate feedback by jointly conditioning on th…

  16. arXiv cs.LG TIER_1 English(EN) · Yun-Chen Cheng, Che-Yu Lin, Cheng-Lin Yang ·

    Score $\times$ Decoder: 无监督推理时缩放以减轻幻觉的统一视图

    arXiv:2606.00739v1 Announce Type: new Abstract: Large language models hallucinate even when the answer lies within their parameters. While inference-time scaling can surface this latent knowledge, the most effective methods require supervision: a trained verifier or reward model.…

  17. arXiv cs.LG TIER_1 English(EN) · Wentao Ye, Liyao Li, Zhiqing Xiao, Muzhi Zhu, Jiaqi Hu, Zhanming Shen, Xiaomeng Hu, Sean Du, Haobo Wang ·

    FLaG:细粒度潜在分组用于幻觉检测

    arXiv:2606.00301v1 Announce Type: new Abstract: Hallucinations in large language models (LLMs) arise from heterogeneous failure mechanisms, making reliable detection difficult for any single global uncertainty score. In this work, we formulate hallucination detection as a mechani…

  18. arXiv cs.CL TIER_1 English(EN) · Yasser Hamidullah, Koel Dutta Chowdhury, Yusser Al Ghussin, Shakib Yazdani, Cennet Oguz, Josef van Genabith, Cristina Espa\~na-Bonet ·

    是 grounding 还是 guessing?用于检测手语翻译幻觉的视觉信号

    arXiv:2510.18439v3 Announce Type: replace Abstract: Hallucination, where models generate fluent text unsupported by visual evidence, remains a major flaw in vision-language models and is particularly critical in sign language translation (SLT). In SLT, meaning depends on precise …

  19. arXiv cs.CL TIER_1 English(EN) · Mohit Singh Chauhan ·

    DECK:LLM幻觉的一致性与置信度分类法

    arXiv:2606.02289v1 Announce Type: new Abstract: Existing hallucination taxonomies classify LLM errors by what is wrong with the output -- memorised misconceptions, reasoning failures, fluent fabrications. These taxonomies are useful for diagnosis but cannot answer a different que…

  20. arXiv cs.CL TIER_1 English(EN) · Yiming Liao, Zeno Franco, Jose Eduardo Lizarraga Mazaba, Keke Chen ·

    Med-HEAL:在医疗大模型中通过“幻觉感知上下文学习”进行分析和缓解幻觉

    arXiv:2606.01301v1 Announce Type: new Abstract: Hallucinations in medical large language models (LLMs) pose serious risks for clinical decision support, particularly when models must reason over complex electronic health records (EHRs). However, existing benchmarks often lack a r…

  21. arXiv cs.CL TIER_1 English(EN) · S M Tahmid Siddiqui, Akib Jawad Ononto, Anoop Singhal, Latifur Khan ·

    迈向轻量级可靠性:使用软提示缓解大型语言模型的幻觉

    arXiv:2606.00919v1 Announce Type: new Abstract: Large language models (LLMs) have seen widespread adoption across various domains, yet their reliability is frequently undermined by hallucinations - responses that are plausible-sounding but factually incorrect. In high-stakes doma…

  22. arXiv cs.AI TIER_1 English(EN) · Buyun Liang, Jinqi Luo, Liangzu Peng, Kwan Ho Ryan Chan, Darshan Thaker, Kaleab A. Kinfu, Fengrui Tian, Hamed Hassani, Ren\'e Vidal ·

    REALISTA:诱发大型语言模型幻觉的真实潜在对抗性攻击

    arXiv:2605.12813v2 Announce Type: replace-cross Abstract: Large language models (LLMs) achieve strong performance across many tasks but remain vulnerable to hallucinations, making it important to systematically evaluate their reliability under realistic adversarial inputs. We for…

  23. arXiv cs.AI TIER_1 English(EN) · Bohan Yang, Yijun Gong, Zhi Zhang, Ge Zhang, Wenpeng Xing, Meng Han ·

    TriLens:用于白盒幻觉检测的逐层 Logit-Lens 熵

    arXiv:2606.01033v1 Announce Type: new Abstract: When a language model hallucinates, the final answer is wrong, but the mistake is not necessarily invisible inside the model. Different internal pathways may remain uncertain, disagree in how quickly they sharpen, or commit to compe…

  24. arXiv cs.AI TIER_1 English(EN) · Hanze Li, Jinhao You, Yichen Guo, Kai Tang, Shuangyang Xie, Xiande Huang ·

    通过跳过解码器层来减轻大型语言模型的幻觉

    arXiv:2606.00819v1 Announce Type: new Abstract: Large Language Models (LLMs) have achieved strong performance across diverse natural language tasks, yet their outputs often suffer from hallucinations -- content that is misaligned with factual information. In this work, we conduct…

  25. arXiv cs.CL TIER_1 English(EN) · Mohit Singh Chauhan ·

    DECK:LLM幻觉的一致性x置信度分类法

    Existing hallucination taxonomies classify LLM errors by what is wrong with the output -- memorised misconceptions, reasoning failures, fluent fabrications. These taxonomies are useful for diagnosis but cannot answer a different question: which uncertainty scorer would have caugh…

  26. arXiv cs.AI TIER_1 English(EN) · Jiaming Li, Jiacheng Zhang, Zequn Jie, Lin Ma, Guanbin Li ·

    面向LVLM幻觉缓解的跨模态注意力校准

    arXiv:2501.01926v3 Announce Type: replace-cross Abstract: Large vision-language models (LVLMs) have shown remarkable capabilities in visual-language understanding. Despite their success, LVLMs still suffer from generating hallucinations in complex generation tasks, leading to inc…

  27. arXiv cs.AI TIER_1 English(EN) · Litian Liu, Reza Pourreza, Yubing Jian, Yao Qin, Roland Memisevic ·

    从分布外检测到幻觉检测:一个几何视角

    arXiv:2602.07253v2 Announce Type: replace Abstract: Detecting hallucinations in large language models is a critical open problem with significant implications for safety and reliability. While existing hallucination detection methods achieve strong performance in question-answeri…

  28. arXiv cs.AI TIER_1 English(EN) · Soorya Ram Shimgekar, Agam Goyal, Amruta Parulekar, Joshua Chen, Yian Wang, Navin Kumar, Hari Sundaram, Eshwar Chandrasekharan, Koustuv Saha ·

    有毒的幻觉:扰动提示词与追踪大语言模型电路

    arXiv:2605.30913v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly deployed in conversational settings where user tone ranges from polite to adversarial or toxic, yet less is known about whether toxic language in otherwise semantically equivalent prom…

  29. arXiv cs.CL TIER_1 English(EN) · Shefayat E Shams Adib, Ahmed Alfey Sani, Ekramul Alam Esham, Ajwad Abrar, Ishmam Tashdeed, Md Taukir Azam Chowdhury ·

    BenHalluEval:面向孟加拉语大语言模型的、多任务幻觉评估框架

    arXiv:2605.31483v1 Announce Type: new Abstract: Despite Bengali being the sixth most spoken language in the world, no prior work has systematically evaluated hallucination in large language models (LLMs) for Bengali. We introduce BenHalluEval, a fine-grained hallucination evaluat…

  30. arXiv cs.CL TIER_1 English(EN) · Haolin Deng, Xin Zou, Zhiwei Jin, Chen Chen, Haonan Lu, Xuming Hu ·

    从细粒度视觉差异中学习:通过上下文视觉对比优化减轻多模态幻觉

    arXiv:2605.31312v1 Announce Type: cross Abstract: Multimodal hallucination remains a persistent challenge for Vision-Language Models (VLMs). Standard textual Direct Preference Optimization (DPO) often fails to mitigate it due to a lack of explicit visual supervision. While existi…

  31. arXiv cs.AI TIER_1 English(EN) · Yusheng He, Jizhe Zhou, Xia Du, Zheng Lin, Jun Luo, Jiancheng Lv ·

    是什么让 LVLMs 产生更少的幻觉?揭示幻觉鲁棒性背后的架构因素

    arXiv:2605.30911v1 Announce Type: cross Abstract: Hallucination remains one of the key challenges undermining the reliability of Large Vision-Language Models (LVLMs). But what makes an LVLM hallucinate less? Many existing efforts focus on improving internal components of the mode…

  32. Hugging Face Daily Papers TIER_1 English(EN) ·

    BenHalluEval:面向孟加拉语大语言模型的、多任务幻觉评估框架

    Despite Bengali being the sixth most spoken language in the world, no prior work has systematically evaluated hallucination in large language models (LLMs) for Bengali. We introduce BenHalluEval, a fine-grained hallucination evaluation framework for Bengali covering four tasks: G…

  33. arXiv cs.CL TIER_1 English(EN) · Md Taukir Azam Chowdhury ·

    BenHalluEval:面向孟加拉语大语言模型的、多任务幻觉评估框架

    Despite Bengali being the sixth most spoken language in the world, no prior work has systematically evaluated hallucination in large language models (LLMs) for Bengali. We introduce BenHalluEval, a fine-grained hallucination evaluation framework for Bengali covering four tasks: G…

  34. arXiv cs.CL TIER_1 English(EN) · Xuming Hu ·

    从细粒度视觉差异中学习:通过上下文视觉对比优化减轻多模态幻觉

    Multimodal hallucination remains a persistent challenge for Vision-Language Models (VLMs). Standard textual Direct Preference Optimization (DPO) often fails to mitigate it due to a lack of explicit visual supervision. While existing works introduce visual preference DPO by contra…

  35. arXiv cs.CL TIER_1 English(EN) · Chaodong Tong, Qi Zhang, Zhuojun Jiang, Lei Jiang, Yanbing Liu ·

    HaluNet:从LLM问答内部信号中学习幻觉风险

    arXiv:2512.24562v2 Announce Type: replace Abstract: Large language models (LLMs) achieve strong question answering (QA) performance but can produce fluent answers unsupported by available evidence. Existing hallucination detectors often rely on external verification, repeated sam…

  36. arXiv cs.LG TIER_1 English(EN) · Eunbyeol Cho, Yunseung Lee, Mirae Kim, Jeewon Yang, Youngjun Kwak, Edward Choi ·

    K-FinHallu: 韩国金融多轮RAG的幻觉检测基准

    arXiv:2605.29523v1 Announce Type: new Abstract: Large Language Models (LLMs) have advanced financial automation through Retrieval-Augmented Generation (RAG), yet hallucinations remain a critical barrier to deployment in high-stakes environments. Existing benchmarks focus on singl…

  37. arXiv cs.LG TIER_1 English(EN) · Chia-Yi Hsu, Chia-Mu Yu, Chun-Ying Huang, Jun Sakuma ·

    无害却有害:用于隐蔽幻觉引导的代理技能中的中性提示攻击

    arXiv:2605.29354v1 Announce Type: cross Abstract: LLM-powered coding agents increasingly participate in software development workflows by generating code, selecting dependencies, and producing package installation commands. This creates a new software supply chain risk: when an a…

  38. arXiv cs.AI TIER_1 English(EN) · Zhe Qian, Yanbiao Ma, Zhuohan Ouyang, Zhonghua Wang, Zhongxing Xu, Fei Luo, Xinyu Liu, Zongyuan Ge, Yike Guo, Jungong Han ·

    认知枢轴点与视觉锚定:揭示和纠正多模态推理模型的幻觉

    arXiv:2604.10219v2 Announce Type: replace Abstract: Multimodal Large Reasoning Models (MLRMs) have achieved remarkable strides in visual reasoning through test time compute scaling, yet long chain reasoning remains prone to hallucinations. We identify a concerning phenomenon term…

  39. arXiv cs.AI TIER_1 English(EN) · Soumyadeep Jana, Pulkit Mittal, Sanasam Ranbir Singh ·

    通过屏障调控的自适应闭式引导减轻视觉语言模型的幻觉

    arXiv:2605.29881v1 Announce Type: cross Abstract: Large vision-language models (LVLMs) often hallucinate objects that are not present in the input image, largely because visual grounding weakens as decoding progresses. Existing inference-time mitigation methods modify logits or h…

  40. arXiv cs.AI TIER_1 English(EN) · Shamanth Kuthpadi Seethakantha, Dung Ngoc Thai, Vara Prasad Gudi, Simran Tiwari, Rami Matar, Avijit Mitra, Wenlong Zhao, Wael Salloum, Andrew McCallum ·

    基于幻觉检测引导的临床摘要偏好优化

    arXiv:2605.28910v1 Announce Type: cross Abstract: Large language models (LLMs) have shown promise on summarization tasks, but they often produce hallucinations, which are unsupported or incorrect statements that limit their reliability in specialized healthcare applications. We i…

  41. arXiv cs.AI TIER_1 English(EN) · Diego Gosmar, Deborah A. Dahl ·

    通过代理式AI、嵌套学习和语义缓存实现AI可持续性来缓解幻觉

    arXiv:2605.29055v1 Announce Type: new Abstract: Hallucination remains a major reliability barrier for production LLM systems, particularly in multi-agent pipelines where unsupported claims can propagate unchecked across stages. This paper adapts a HOPE-inspired Nested Learning ar…

  42. Hugging Face Daily Papers TIER_1 English(EN) ·

    Score-Control for Hallucination Reduction in Diffusion Models

    Variance-Guided Score Modulation reduces hallucinations in diffusion models by controlling score function smoothness through Jacobian modulation while maintaining image quality.

  43. arXiv cs.CL TIER_1 English(EN) · Saptarshi Sengupta, Suhang Wang ·

    幻觉有益吗?通过链式系统一/二推理,用SLM解决多跳问题

    arXiv:2605.27596v1 Announce Type: new Abstract: Recently, there has been increased interest in Small Language Models (SLMs), which are fast, show good performance, and have lower hardware demands than large language models (LLMs). However, SLMs hallucinate more frequently than LL…

  44. arXiv cs.CL TIER_1 English(EN) · Yuang Huang, Yafeng Zhang, Yu Zilan ·

    面向大型视觉语言模型幻觉缓解的风险感知选择性提示

    arXiv:2605.28123v1 Announce Type: new Abstract: Prompt-based verification is widely used to mitigate hallucinations in large vision-language models (LVLMs), yet when it helps remains poorly understood. We systematically study verification prompting across two representative LVLM …

  45. arXiv cs.AI TIER_1 English(EN) · Partho Ghose, Al Bashir, Prem Raj, Azlan Zahid ·

    多模态大语言模型在农业图像解读与生成任务中的幻觉行为

    arXiv:2605.27595v1 Announce Type: cross Abstract: Large Language Models (LLMs) are being rapidly adopted in agricultural imaging applications, ranging from crop interpretation to synthetic field image generation. However, these models frequently exhibit hallucinations outputs tha…

  46. arXiv cs.CL TIER_1 English(EN) · Jingwen Wu, Xijun Zhang, Ge Song ·

    重新思考视觉忽视:通过上下文偏好引导以减轻 MLLM 的幻觉

    arXiv:2605.27993v1 Announce Type: new Abstract: Object hallucination remains a primary obstacle to the reliable deployment of Multimodal Large Language Models (MLLMs). Current inference-time mitigation methods mainly assume hallucinations stem from visual neglect, steering models…

  47. arXiv cs.CL TIER_1 English(EN) · Joan Vendrell Gallart, Solmaz Kia, Russell Bent, Michael Grosskopf ·

    基于链的自适应重构用于减少幻觉

    arXiv:2605.27706v1 Announce Type: new Abstract: We introduce CAROL (Chain-based Adaptive Reconfiguration Over Lattices), a probabilistic framework for test-time hallucination reduction in large language models. Rather than relying on token-level uncertainty, CAROL defines a seman…

  48. arXiv cs.AI TIER_1 English(EN) · Mattia J. Villani, Pranav Deshpande, Akshay Seshadri, Romina Yalovetzky, Niraj Kumar ·

    生成模型幻觉的熵分布指纹

    arXiv:2605.28264v1 Announce Type: new Abstract: Large Language Models (LLMs) often generate factually incorrect outputs, commonly termed hallucinations, that undermine trust and limit deployment in high-stakes settings. Existing hallucination detection methods typically require m…

  49. arXiv cs.MA (Multiagent) TIER_1 English(EN) · Deborah A. Dahl ·

    通过代理式AI、嵌套学习和语义缓存实现AI可持续性来缓解幻觉

    Hallucination remains a major reliability barrier for production LLM systems, particularly in multi-agent pipelines where unsupported claims can propagate unchecked across stages. This paper adapts a HOPE-inspired Nested Learning architecture with Continuum Memory Systems (CMS) a…

  50. arXiv cs.AI TIER_1 English(EN) · Xinpeng Wang, William Cao, Andrew Gordon Wilson, Zhe Zeng ·

    自动层选择用于幻觉检测

    arXiv:2605.26366v1 Announce Type: new Abstract: Recent studies on hallucination detection have shown that hallucination-related signals are more strongly encoded in intermediate layers than in the final layer of large language models (LLMs). Although a growing body of work has so…

  51. arXiv cs.AI TIER_1 English(EN) · Nishant P. Das, Piyush Srivastava ·

    创新:对幻觉的近乎刻画

    arXiv:2605.26808v1 Announce Type: cross Abstract: Hallucination is a central limitation of large language models (LLMs), and substantial effort has been devoted to understanding and mitigating it. Towards this, Kalai and Vempala (STOC 2024) introduced a probabilistic framework fo…

  52. arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Michael Grosskopf ·

    基于链的自适应重构用于减少幻觉

    We introduce CAROL (Chain-based Adaptive Reconfiguration Over Lattices), a probabilistic framework for test-time hallucination reduction in large language models. Rather than relying on token-level uncertainty, CAROL defines a semantic uncertainty measure based on the consistency…

  53. Hugging Face Daily Papers TIER_1 English(EN) ·

    多模态大语言模型在农业图像解读与生成任务中的幻觉行为

    Large Language Models (LLMs) are being rapidly adopted in agricultural imaging applications, ranging from crop interpretation to synthetic field image generation. However, these models frequently exhibit hallucinations outputs that appear confident yet deviate from biological or …

  54. arXiv cs.LG TIER_1 English(EN) · Piyush Srivastava ·

    创新:对幻觉的近乎刻画

    Hallucination is a central limitation of large language models (LLMs), and substantial effort has been devoted to understanding and mitigating it. Towards this, Kalai and Vempala (STOC 2024) introduced a probabilistic framework formalizing calibration and hallucination, and showe…

  55. arXiv cs.CL TIER_1 English(EN) · Riasad Alvi, Nurul Labib Sayeedi, Md. Faiyaz Abdullah Sayeedi ·

    MultiHaluDet:通过LLM隐藏状态探测进行多语言幻觉检测

    arXiv:2605.24919v1 Announce Type: new Abstract: Hallucinations in Large Language Models (LLMs) represent a critical barrier to their reliable deployment, a vulnerability heavily exacerbated in non-English and resource-constrained contexts. Existing detection approaches that rely …

  56. arXiv cs.AI TIER_1 English(EN) · Yuanzhi Xu, Qian Gao, Jun Fan, Guohui Ding, Zhenyu Yang, Sixue Lin, Yuteng Xiao ·

    通过区域感知注意力重新校准减轻视觉语言模型中的对象幻觉

    arXiv:2605.24957v1 Announce Type: new Abstract: The generation of factually incorrect objects, commonly known as object hallucination, remains a persistent challenge in Large Vision-Language Models (LVLMs). Current approaches to address this issue - ranging from expensive data-dr…

  57. arXiv cs.AI TIER_1 English(EN) · Hinduja Nirujan, Shreyas Patil, Abdallah Ayoub, Ahmad Abdel Latif, Gouri Ginde ·

    大型语言模型生成的错误报告摘要中的幻觉的实证分析与检测

    arXiv:2605.24137v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly used to generate summaries of software bug reports, including sections such as Steps-to-Reproduce (S2R), Actual Behavior (AB), and Expected Behavior (EB). However, these models frequen…

  58. arXiv cs.AI TIER_1 English(EN) · Quanjiang Li, Zhiming Liu, Wei Luo, Tingjin Luo, Chenping Hou ·

    纠正注意力分散引起的视觉模糊以减少幻觉:算法与理论

    arXiv:2605.24602v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) frequently suffer from object hallucinations, yet the visual perceptual mechanism underlying this failure remains poorly understood. In this work, we reveal that hallucinations are strongly…

  59. arXiv cs.AI TIER_1 English(EN) · Yuhao Zhan, Tianyu Fan, Linxuan Huang, Zirui Guo, Chao Huang ·

    您的深度研究代理为何失败?关于完整研究轨迹中的幻觉评估

    arXiv:2601.22984v2 Announce Type: replace Abstract: Diagnosing failure patterns in Deep Research Agents (DRAs) remains a critical challenge. Existing benchmarks predominantly rely on end-to-end evaluation, obscuring intermediate hallucinations that accumulate throughout the resea…

  60. arXiv cs.AI TIER_1 English(EN) · Shuqi Zhu, Yi Zhong, Ziyi Ye, Bangde Du, Yujia Zhou, Qingyao Ai, Yiqun Liu ·

    人类如何处理AI生成的幻觉内容:一项神经影像学研究

    arXiv:2605.16953v2 Announce Type: replace Abstract: While AI-generated hallucinations pose considerable risks, the underlying cognitive mechanisms by which humans can successfully recognize or be misled by these hallucinations remain unclear. To address this problem, this paper e…

  61. arXiv cs.CL TIER_1 English(EN) · Musarrat Zeba, Abdullah Al Mamun, Kishoar Jahan Tithee, Debopom Sutradhar, Mohaimenul Azam Khan Raiaan, Saddam Mukta, Reem E. Mohamed, Md Rafiqul Islam, Yakub Sebastian, Mukhtar Hussain, Sami Azam ·

    通过细粒度事实核查和领域特定适应来减轻医疗保健 LLM 中的幻觉

    arXiv:2512.16189v3 Announce Type: replace Abstract: In healthcare, it is essential for any LLM-generated output to be reliable and accurate, particularly in cases involving decision-making and patient safety. However, the outputs are often unreliable in such critical areas due to…

  62. arXiv cs.AI TIER_1 English(EN) · Yutong Xie, Zhenglin Hua, Ran Wang, Wing W. Y. Ng, Xizhao Wang, Yuheng Jia ·

    在不遗忘的情况下找到正确的视觉证据:通过层间视觉注意力差异减轻 LVLM 的幻觉

    arXiv:2605.20965v1 Announce Type: cross Abstract: Large Vision-Language Models (LVLMs) have shown remarkable performance on a wide range of vision-language tasks. Despite this progress, they are still prone to hallucination, generating responses that are inconsistent with visual …

  63. arXiv cs.AI TIER_1 English(EN) · Jianfei Li, Ines Rosellon-Inclan, Gitta Kutyniok, Jean-Luc Starck ·

    CHEM:深度学习图像处理中幻觉的估计与理解

    arXiv:2512.09806v2 Announce Type: replace-cross Abstract: Deep learning-based methods have recently achieved significant success in image reconstruction problems. However, challenges have emerged, as these methods may generate unrealistic artifacts or hallucinations, which can in…

  64. arXiv cs.AI TIER_1 English(EN) · Yuheng Jia ·

    在不遗忘的情况下寻找正确的视觉证据:通过层间视觉注意力差异减轻 LVLM 的幻觉

    Large Vision-Language Models (LVLMs) have shown remarkable performance on a wide range of vision-language tasks. Despite this progress, they are still prone to hallucination, generating responses that are inconsistent with visual content. In this work, we find that LVLMs tend to …

  65. Hugging Face Daily Papers TIER_1 English(EN) ·

    VIHD:基于视觉干预的医学视觉问答幻觉检测

    While medical Multimodal Large Language Models (MLLMs) have shown promise in assisting diagnosis, they still frequently generate hallucinated responses that appear linguistically plausible but lack visual evidence. Such hallucinations pose risks to clinical decision-making and ne…

  66. arXiv cs.CL TIER_1 English(EN) · Tej Sanibh Ranade ·

    TRACE: 基于跨层证据的轨迹校正以减少幻觉

    Hallucination correction is not a one-direction problem. We show that intermediate layers are neither uniformly more truthful than final layers nor uniformly less trustworthy. Yet hallucination reduction is usually instantiated through one fixed intervention form: contrast one la…

  67. arXiv cs.CL TIER_1 English(EN) · Lijie Wen ·

    我们真的需要外部工具来缓解幻觉吗?SIRA:共享前缀的内部归因重建

    Large vision-language models (LVLMs) often hallucinate when language priors dominate weak or ambiguous visual evidence. Existing contrastive decoding methods mitigate this problem by comparing predictions from the original image with those from externally perturbed visual inputs,…

  68. arXiv cs.CL TIER_1 English(EN) · Yubin Xia ·

    当答案偏离问题时:通过问答正交分解检测幻觉

    Hallucination detection in large language models (LLMs) requires balancing accu racy, efficiency, and robustness to distribution shift. Black-box consistency methods are effective but demand repeated inference; single-pass white-box probes are effi cient yet treat answer represen…

  69. arXiv cs.AI TIER_1 English(EN) · Ali Baheri ·

    推理在何处失效?通过隐藏状态传输几何进行步进式幻觉检测

    Large language models hallucinate during multi-step reasoning, but most existing detectors operate at the trace level: they assign one confidence score to a full output, fail to localize the first error, and often require multiple sampled completions. We frame hallucination inste…

  70. arXiv cs.AI TIER_1 English(EN) · Amine Trabelsi ·

    CAAFC:用于错误信息/非事实性幻觉检测和纠正的按时间顺序可操作的自动化事实核查器

    With the vast amount of content uploaded every hour, along with the AI generated content that can include hallucinations, Automated Fact-Checking (AFC) has become increasingly vital, as it is infeasible for human fact-checkers to manually verify the sheer volume of information ge…

  71. arXiv cs.AI TIER_1 English(EN) · Yi R. Fung ·

    大型语言模型中的可扩展令牌级幻觉检测

    Large language models (LLMs) have demonstrated remarkable capabilities, but they still frequently produce hallucinations. These hallucinations are difficult to detect in reasoning-intensive tasks, where the content appears coherent but contains errors like logical flaws and unrel…

  72. arXiv cs.LG TIER_1 English(EN) · Ruixuan Wang ·

    Instruction Lens Score:您的指令为多模态大语言模型贡献了一个强大的对象幻觉检测器

    Multimodal large language models (MLLMs) have achieved remarkable progress, yet the object hallucination remains a critical challenge for reliable deployment. In this paper, we present an in-depth analysis of instruction token embeddings and reveal that they implicitly encode vis…

  73. arXiv cs.AI TIER_1 English(EN) · Yian Yin ·

    LLM幻觉的真实案例:来自不存在引用的海量证据

    Large language models (LLMs) are known to generate plausible but false information across a wide range of contexts, yet the real-world magnitude and consequences of this hallucination problem remain poorly understood. Here we leverage a uniquely verifiable object - scientific cit…

  74. arXiv cs.CL TIER_1 English(EN) · Brandon C. Colelough, Davis Bartels, Dina Demner-Fushman ·

    量化语言模型在医学教科书中的幻觉

    arXiv:2603.09986v2 Announce Type: replace Abstract: Hallucinations, the tendency for large language models to provide responses with factually incorrect and unsupported claims, is a serious problem within natural language processing for which we do not yet have an effective solut…

  75. arXiv cs.CL TIER_1 English(EN) · Erik Nielsen, Elia Cunegatti, Marcus Vukojevic, Giovanni Iacca ·

    幻觉作为一种异常:通过概率电路进行动态干预

    arXiv:2605.05953v1 Announce Type: new Abstract: One of the most critical challenges in Large Language Models is their tendency to hallucinate, i.e., produce factually incorrect responses. Existing approaches show promising results in terms of hallucination correction, but still s…

  76. arXiv cs.CL TIER_1 English(EN) · Giovanni Iacca ·

    幻觉作为一种异常:通过概率电路进行动态干预

    One of the most critical challenges in Large Language Models is their tendency to hallucinate, i.e., produce factually incorrect responses. Existing approaches show promising results in terms of hallucination correction, but still suffer from a main limitation: they apply correct…

  77. Hugging Face Daily Papers TIER_1 English(EN) ·

    幻觉作为一种异常:通过概率电路进行动态干预

    One of the most critical challenges in Large Language Models is their tendency to hallucinate, i.e., produce factually incorrect responses. Existing approaches show promising results in terms of hallucination correction, but still suffer from a main limitation: they apply correct…

  78. arXiv cs.CL TIER_1 English(EN) · Mina Gabriel ·

    首个 Token 即知:单次解码的幻觉检测置信度

    arXiv:2605.05166v1 Announce Type: new Abstract: Self-consistency detects hallucinations by generating multiple sampled answers to a question and measuring agreement, but this requires repeated decoding and can be sensitive to lexical variation. Semantic self-consistency improves …

  79. arXiv cs.LG TIER_1 English(EN) · Dan Wilson, Mohamed Akrout ·

    通过动力学系统预测低成本黑盒检测LLM幻觉

    arXiv:2605.05134v1 Announce Type: new Abstract: Large Language Models (LLMs) frequently generate plausible but non-factual content, a phenomenon known as hallucination. While existing detection methods typically rely on computationally expensive sampling-based consistency checks …

  80. arXiv cs.CL TIER_1 English(EN) · Gijs van Dijk ·

    利用内部注意力发散信号检测大型语言模型的幻觉

    arXiv:2605.05025v1 Announce Type: new Abstract: We propose a lightweight and single-pass uncertainty quantification method for detecting hallucinations in Large Language Models. The method uses attention matrices to estimate uncertainty without requiring repeated sampling or exte…

  81. arXiv cs.LG TIER_1 English(EN) · Linggang Kong, Lei Wu, Yunlong Zhang, Xiaofeng Zhong, Zhen Wang, Yongjie Wang, Yao Pan ·

    CausalGaze:通过大型语言模型的反事实图干预揭示幻觉

    arXiv:2604.11087v2 Announce Type: replace Abstract: Despite the groundbreaking advancements made by large language models (LLMs), hallucination remains a critical bottleneck for their deployment in high-stakes domains. Existing classification-based methods mainly rely on static a…

  82. arXiv cs.CL TIER_1 English(EN) · Philip Wootaek Shin, Ajay Narayanan Sridhar, Sivani Devarapalli, Rui Zhang, Jack Sampson, Vijaykrishnan Narayanan ·

    关系破裂之时:分析视觉语言模型在旋转和噪声下的关系幻觉

    arXiv:2605.05045v1 Announce Type: cross Abstract: Vision-language models (VLMs) achieve strong multimodal performance but remain prone to relation hallucination, which requires accurate reasoning over inter-object interactions. We study the impact of visual perturbations, specifi…

  83. arXiv cs.CL TIER_1 English(EN) · Mina Gabriel ·

    首个 Token 知晓:单次解码的幻觉检测置信度

    Self-consistency detects hallucinations by generating multiple sampled answers to a question and measuring agreement, but this requires repeated decoding and can be sensitive to lexical variation. Semantic self-consistency improves this by clustering sampled answers by meaning us…

  84. arXiv cs.LG TIER_1 English(EN) · Mohamed Akrout ·

    通过动力学系统预测低成本黑盒检测大型语言模型幻觉

    Large Language Models (LLMs) frequently generate plausible but non-factual content, a phenomenon known as hallucination. While existing detection methods typically rely on computationally expensive sampling-based consistency checks or external knowledge retrieval, we propose a ne…

  85. arXiv cs.CL TIER_1 English(EN) · Gijs van Dijk ·

    利用内部注意力散度信号检测大型语言模型的幻觉

    We propose a lightweight and single-pass uncertainty quantification method for detecting hallucinations in Large Language Models. The method uses attention matrices to estimate uncertainty without requiring repeated sampling or external models. Specifically, we measure the Kullba…

  86. arXiv cs.AI TIER_1 English(EN) · Ahmed Ibrahim ·

    用于无幻觉需求复用的神经符号智能体

    arXiv:2605.01562v1 Announce Type: cross Abstract: The Object-Oriented Method for Requirements Authoring and Management (OOMRAM) is a requirements reuse framework that relies on exact identifier matching and rigid templates, limiting its ability to adapt specifications across dive…

  87. arXiv cs.CL TIER_1 English(EN) · Severin Ye, Xiao Kong, Xiaopeng He, Guangsu Yan, Dongsuk Oh ·

    CuraView:一个用于医学幻觉检测的多智能体框架,并采用增强的GraphRAG知识验证

    arXiv:2605.03476v1 Announce Type: new Abstract: Discharge summaries require extracting critical information from lengthy electronic health records (EHRs), a process that is labor-intensive when performed manually. Large language models (LLMs) can improve generation efficiency; ho…

  88. arXiv cs.CL TIER_1 English(EN) · Hao Mi, Qiang Sheng, Shaofei Wang, Beizhe Hu, Yifan Sun, Zhengjia Wang, Hengqi Zeng, Yang Li, Danding Wang, Juan Cao ·

    逻辑一致性作为桥梁:通过响应与自我判断之间的标签约束建模改进LLM幻觉检测

    arXiv:2605.03971v1 Announce Type: new Abstract: Large Language Models (LLMs) are prone to factual hallucinations, risking their reliability in real-world applications. Existing hallucination detectors mainly extract micro-level intrinsic patterns for uncertainty quantification or…

  89. arXiv cs.CL TIER_1 English(EN) · Juan Cao ·

    逻辑一致性作为桥梁:通过响应与自我判断之间的标签约束建模改进LLM幻觉检测

    Large Language Models (LLMs) are prone to factual hallucinations, risking their reliability in real-world applications. Existing hallucination detectors mainly extract micro-level intrinsic patterns for uncertainty quantification or elicit macro-level self-judgments through verba…

  90. Hugging Face Daily Papers TIER_1 English(EN) ·

    CuraView:一个用于医学幻觉检测的多智能体框架,并采用增强知识验证的GraphRAG

    Discharge summaries require extracting critical information from lengthy electronic health records (EHRs), a process that is labor-intensive when performed manually. Large language models (LLMs) can improve generation efficiency; however, they are prone to producing faithfulness …

  91. arXiv cs.CL TIER_1 English(EN) · Dongsuk Oh ·

    CuraView:一个用于医学幻觉检测的多智能体框架,并采用增强的GraphRAG知识验证

    Discharge summaries require extracting critical information from lengthy electronic health records (EHRs), a process that is labor-intensive when performed manually. Large language models (LLMs) can improve generation efficiency; however, they are prone to producing faithfulness …

  92. arXiv cs.CL TIER_1 English(EN) · Alexandra Bazarova, Aleksandr Yugay, Andrey Shulga, Alina Ermilova, Andrei Volodichev, Konstantin Polev, Julia Belikova, Rauf Parchiev, Dmitry Simakov, Maxim Savchenko, Andrey Savchenko, Serguei Barannikov, Alexey Zaytsev ·

    使用注意力图上的拓扑散度检测大型语言模型的幻觉

    arXiv:2504.10063v4 Announce Type: replace Abstract: Hallucination, i.e., generating factually incorrect content, remains a critical challenge for large language models (LLMs). We introduce TOHA, a TOpology-based HAllucination detector in the RAG setting, which leverages a topolog…

  93. arXiv cs.CL TIER_1 English(EN) · Ahmed Cherif ·

    HalluScan:一项系统性基准测试,用于检测和减轻指令遵循大型语言模型中的幻觉

    arXiv:2605.02443v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse natural language processing tasks, yet they remain susceptible to hallucinations -- generating content that is factually incorrect, unfaithful to …

  94. arXiv cs.CL TIER_1 English(EN) · Freja Thoresen, Dan Saattrup Smart ·

    一个多语言幻觉基准:MultiWikiQHalluA

    arXiv:2605.02504v1 Announce Type: new Abstract: Most hallucination evaluations focus on English, leaving it unclear whether findings transfer to lower-resource languages. We investigate faithfulness hallucinations, defined as model-generated content that is fluent and plausible b…

  95. arXiv cs.CL TIER_1 English(EN) · Joseph Spracklen, Pedram Aghazadeh, Farinaz Koushanfar, Murtuza Jadliwala ·

    大语言模型捉鬼敢死队:通过自适应遗忘实现手术式幻觉抑制

    arXiv:2605.01047v1 Announce Type: cross Abstract: Hallucinations, outputs that sound plausible but are factually incorrect, remain an open challenge for deployed LLMs. In code generation, models frequently hallucinate non-existent software packages, recommending imports and insta…

  96. arXiv cs.LG TIER_1 English(EN) · Yee Zhing Liew, Andrew Huey Ping Tan, Anwar P. P Abdul Majeed ·

    从扁平事实到尖锐幻觉:通过梯度敏感性检测顽固错误

    arXiv:2605.00939v1 Announce Type: new Abstract: Traditional hallucination detection fails on "Stubborn Hallucinations" -- errors where LLMs are confidently wrong. We propose a geometric solution: Embedding-Perturbed Gradient Sensitivity (EPGS). We hypothesize that while robust fa…

  97. arXiv cs.LG TIER_1 English(EN) · Jianxiong Zhang, Bing Guo, Yuming Jiang, Haobo Wang, Bo An, Sean Du ·

    利用推理轨迹通过答案一致性表示塑造来检测幻觉

    arXiv:2601.17467v2 Announce Type: replace Abstract: Large reasoning models (LRMs) often generate long, seemingly coherent reasoning traces yet still produce incorrect answers, making hallucination detection challenging. Although trajectories contain useful signals, directly using…

  98. arXiv cs.CL TIER_1 English(EN) · Dan Saattrup Smart ·

    一个多语言幻觉基准:MultiWikiQHalluA

    Most hallucination evaluations focus on English, leaving it unclear whether findings transfer to lower-resource languages. We investigate faithfulness hallucinations, defined as model-generated content that is fluent and plausible but diverges from the provided input or is intern…

  99. arXiv cs.CL TIER_1 English(EN) · Ahmed Cherif ·

    HalluScan:用于检测和减轻指令遵循LLM幻觉的系统性基准测试

    Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse natural language processing tasks, yet they remain susceptible to hallucinations -- generating content that is factually incorrect, unfaithful to provided context, or misaligned with user instru…

  100. Hugging Face Daily Papers TIER_1 English(EN) ·

    通过推理时相关性传播减轻多模态大语言模型的幻觉

    Multimodal large language models (MLLMs) have revolutionized the landscape of AI, demonstrating impressive capabilities in tackling complex vision and audio-language tasks. However, a critical challenge remains: these models often suffer from hallucinations, generating outputs th…

  101. arXiv cs.CL TIER_1 English(EN) · Guoshenghui Zhao, Weijie Zhao, Tan Yu ·

    HIVE:用于扩散大型语言模型幻觉检测的隐藏证据验证

    arXiv:2604.26139v1 Announce Type: new Abstract: Diffusion large language models generate text through multi-step denoising, where hallucination signals may emerge throughout the trajectory rather than only in the final output. Existing detectors mainly rely on output uncertainty …

  102. arXiv cs.CL TIER_1 English(EN) · Jiawei Li, Akshayaa Magesh, Venugopal V. Veeravalli ·

    Principled Detection of Hallucinations in Large Language Models via Multiple Testing

    arXiv:2508.18473v3 Announce Type: replace Abstract: While Large Language Models (LLMs) have emerged as powerful foundational models to solve a variety of tasks, they have also been shown to be prone to hallucinations, i.e., generating responses that sound confident but are actual…

  103. arXiv cs.CL TIER_1 English(EN) · Tan Yu ·

    HIVE:用于扩散大型语言模型幻觉检测的隐藏证据验证

    Diffusion large language models generate text through multi-step denoising, where hallucination signals may emerge throughout the trajectory rather than only in the final output. Existing detectors mainly rely on output uncertainty or coarse trace statistics, which often fail to …

  104. arXiv cs.AI TIER_1 English(EN) · Federico A. Kamelhar ·

    GSAR:多智能体大模型中用于幻觉检测和恢复的类型化接地

    arXiv:2604.23366v1 Announce Type: new Abstract: Autonomous multi-agent LLM systems are increasingly deployed to investigate operational incidents and produce structured diagnostic reports. Their trustworthiness hinges on whether each claim is grounded in observed evidence rather …

  105. Hugging Face Daily Papers TIER_1 English(EN) ·

    全局上下文还是局部细节?用于幻觉缓解的自适应视觉基础

    Vision-Language Models (VLMs) are frequently undermined by object hallucination--generating content that contradicts visual reality--due to an over-reliance on linguistic priors. We introduce Positive-and-Negative Decoding (PND), a training-free inference framework that intervene…

  106. Hugging Face Daily Papers TIER_1 English(EN) ·

    GSAR:多智能体大模型中用于幻觉检测和恢复的类型化接地

    Autonomous multi-agent LLM systems are increasingly deployed to investigate operational incidents and produce structured diagnostic reports. Their trustworthiness hinges on whether each claim is grounded in observed evidence rather than model-internal inference. Existing grounded…

  107. arXiv cs.CV TIER_1 English(EN) · Yue Jiang, Xue Jiang, Lihua Zhang, Zhiqiang Wang, Yuhang Lu, Peng Wang, Bo Han, Feng Zheng, Dingkang Yang ·

    MM-Snowball:评估和缓解多模态多轮对话中的幻觉滚雪球效应

    arXiv:2606.00622v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) demonstrate remarkable visual understanding, yet their reliability in interactive settings is severely undermined by hallucination snowballing: a phenomenon where initial errors amplify acros…

  108. arXiv cs.CV TIER_1 English(EN) · Mahesh Bhosale, Naresh Kumar Devulapally, Abdul Wasi, Chau Pham, Vishnu Suresh Lokhande, David Doermann ·

    Score-Control for Hallucination Reduction in Diffusion Models

    arXiv:2606.00377v1 Announce Type: new Abstract: Diffusion models have emerged as the backbone of modern generative AI, powering advances in vision, language, audio and other modalities. Despite their success, they suffer from hallucinations, implausible samples that lie outside t…

  109. arXiv cs.CV TIER_1 English(EN) · Ting Chen, Geng Li, Guohao Chen, Yu Hu, Guan Huang, Mai Chen, Langsheng Lei, Jun Du ·

    YARD:用于大型视觉语言模型高效幻觉缓解的 Y-Architecture 寄存器解码

    arXiv:2605.31429v1 Announce Type: new Abstract: Contrastive decoding (CD) seeks to mitigate hallucinations in Large Vision-Language Models (LVLMs) by contrasting the output distributions of a standard model and a visually degraded model. However, existing training-free CD methods…

  110. arXiv cs.CV TIER_1 English(EN) · Zheng Qi, Chao Shang, Evangelia Spiliopoulou, Nikolaos Pappas ·

    通过捕捉注视转移进行引导:用于 VLM 幻觉缓解的跨模态融合增强

    arXiv:2510.22067v3 Announce Type: replace Abstract: Vision language models (VLMs) often generate hallucination, i.e., content that cannot be substantiated by either textual or visual inputs. Prior work primarily attributes this to over-reliance on linguistic prior knowledge rathe…

  111. arXiv cs.CV TIER_1 English(EN) · Jun Du ·

    YARD:用于大型视觉语言模型高效幻觉缓解的 Y-Architecture 寄存器解码

    Contrastive decoding (CD) seeks to mitigate hallucinations in Large Vision-Language Models (LVLMs) by contrasting the output distributions of a standard model and a visually degraded model. However, existing training-free CD methods suffer from sub-optimal degraded branches: comp…

  112. arXiv cs.CV TIER_1 English(EN) · Shizhe Zhou, Bohan Jia, Kai Wu, Yan Shen, Tongyun Li, Yuyang Wu, Shaohui Lin ·

    ReactBench:通过系统性评估实现多模态幻觉的因果基准测试

    arXiv:2605.29579v1 Announce Type: new Abstract: While multimodal large language models (MLLMs) have achieved rapid progress in vision-language understanding, they remain prone to multimodal hallucinations, producing responses that are inconsistent with the visual input. Existing …

  113. arXiv cs.CV TIER_1 English(EN) · Jiacheng Zhang, Feng Liu, Chao Du, Tianyu Pang ·

    SAVAA:通过分步自适应视觉注意力放大减轻 LVLM 中的幻觉

    arXiv:2602.13600v2 Announce Type: replace Abstract: A line of recent training-free methods for mitigating hallucinations in large vision-language models (LVLMs) operates by amplifying attention to visual tokens during autoregressive generation within a single forward pass. We ref…

  114. arXiv cs.CV TIER_1 English(EN) · Sanasam Ranbir Singh ·

    通过屏障调节的自适应闭式控制减轻视觉语言模型的幻觉

    Large vision-language models (LVLMs) often hallucinate objects that are not present in the input image, largely because visual grounding weakens as decoding progresses. Existing inference-time mitigation methods modify logits or hidden states throughout generation, but they suffe…

  115. arXiv cs.CV TIER_1 English(EN) · Hyunmin Cho, Donghoon Ahn, Susung Hong, Jee Eun Kim, Seungryong Kim, Kyong Hwan Jin ·

    TAG:用于抗幻觉采样的切向放大引导

    arXiv:2510.04533v2 Announce Type: replace Abstract: Diffusion models achieve state-of-the-art image generation but often produce semantic inconsistencies, or hallucinations. Existing inference-time guidance methods rely on external signals or architectural modifications, adding c…

  116. arXiv stat.ML TIER_1 English(EN) · Yedidia Agnimo, Anna Korba, Annabelle Blangero, Nicolas Chesneau, Karteek Alahari ·

    评估不确定性估计器对大型语言模型幻觉的相关性

    arXiv:2605.27016v1 Announce Type: cross Abstract: Large language models (LLMs) are prone to hallucinations, i.e., statements unsupported by the input or training data, hindering reliable deployment. In parallel, numerous uncertainty estimation (UE) methods have been proposed to q…

  117. arXiv stat.ML TIER_1 English(EN) · Karteek Alahari ·

    评估不确定性估计器对大型语言模型幻觉的相关性

    Large language models (LLMs) are prone to hallucinations, i.e., statements unsupported by the input or training data, hindering reliable deployment. In parallel, numerous uncertainty estimation (UE) methods have been proposed to quantify model confidence and are often implicitly …

  118. arXiv cs.CV TIER_1 English(EN) · Jiayi Chen, Benteng Ma, Zehui Liao, Winston Chong, Yasmeen George, Jianfei Cai ·

    VIHD:基于视觉干预的幻觉检测用于医学视觉问答

    arXiv:2605.20772v2 Announce Type: replace Abstract: While medical Multimodal Large Language Models (MLLMs) have shown promise in assisting diagnosis, they still frequently generate hallucinated responses that appear linguistically plausible but lack visual evidence. Such hallucin…

  119. arXiv cs.CV TIER_1 English(EN) · Zhe Cheng, Wenyu Chen, Fode Zhang, Dehuan Shen ·

    通过因果路径门控减轻大型视觉语言模型的幻觉

    arXiv:2605.24024v1 Announce Type: new Abstract: Large vision-language models (LVLMs) often hallucinate content that is fluent yet unsupported by the image, limiting their reliability in real-world deployment. We show that a key failure mode arises from route competition: even whe…

  120. arXiv cs.CV TIER_1 English(EN) · Shangpin Peng, Senqiao Yang, Li Jiang, Zhuotao Tian ·

    通过句子级早期干预减轻对象幻觉

    arXiv:2507.12455v3 Announce Type: replace Abstract: Multimodal large language models (MLLMs) have revolutionized cross-modal understanding but continue to struggle with hallucinations - fabricated content contradicting visual inputs. Existing hallucination mitigation methods eith…

  121. arXiv cs.CV TIER_1 English(EN) · Deepu Rajan ·

    通过强调图像负面标记减少 LVLM 中的对象幻觉

    Object hallucination is a significant challenge that hinders the application of large vision-language models (LVLMs) in practice. We hypothesize that one possible origin of hallucination is the model's tendency to prioritize text generation over meaningful interaction with images…

  122. arXiv stat.ML TIER_1 English(EN) · Emmy Liu, Varun Gangal, Michael Yu, Zhuofu Tao, Karan Singh, Sachin Kumar, Steven Y. Feng ·

    HalluWorld:通过参考世界模型进行幻觉控制的基准测试

    arXiv:2605.19341v1 Announce Type: cross Abstract: Hallucination remains a central failure mode of large language models, but existing benchmarks operationalize it inconsistently across summarization, question answering, retrieval-augmented generation, and agentic interaction. Thi…

  123. arXiv stat.ML TIER_1 English(EN) · Steven Y. Feng ·

    HalluWorld:通过参考世界模型进行幻觉控制的基准测试

    Hallucination remains a central failure mode of large language models, but existing benchmarks operationalize it inconsistently across summarization, question answering, retrieval-augmented generation, and agentic interaction. This fragmentation makes it unclear whether a mitigat…

  124. arXiv cs.CV TIER_1 English(EN) · Yu Wang ·

    MHSA:一种通过引导注意力减轻 LVLM 幻觉的轻量级框架

    Large vision-language models (LVLMs) have achieved remarkable performance across diverse multimodal tasks, yet they continue to suffer from hallucinations, generating content that is inconsistent with the visual input. Prior work DHCP (Detecting Hallucinations by Cross-modal Atte…

  125. arXiv cs.CV TIER_1 English(EN) · Aofan Liu ·

    Vision-Language 模型中的物体幻觉双通路回路

    Vision-language models (VLMs) have demonstrated remarkable capabilities in bridging visual perception and natural language understanding, enabling a wide range of multimodal reasoning tasks. However, they often produce object hallucinations, describing content absent from the inp…

  126. arXiv cs.CV TIER_1 English(EN) · Jing Li ·

    LVLMs中的词汇劫持:通过排除惰性标记来揭示关键注意力头以减轻幻觉

    Large Vision-Language Models (LVLMs) have achieved remarkable progress in multimodal tasks, yet their reliability is persistently undermined by hallucinations-generating text that contradicts visual input. Recent studies often attribute these errors to inadequate visual attention…

  127. arXiv stat.ML TIER_1 English(EN) · Prabhat Kc, Rongping Zeng, Nirmal Soni, Aldo Badano ·

    sFRC 用于评估医学图像修复中的幻觉

    arXiv:2603.04673v2 Announce Type: replace-cross Abstract: Deep learning (DL) methods are currently being explored to restore images from sparse-view-, limited-data-, and undersampled-based acquisitions in medical applications. Although outputs from DL may appear visually appealin…

  128. arXiv cs.CV TIER_1 English(EN) · Vijaykrishnan Narayanan ·

    关系破裂时:分析视觉语言模型在旋转和噪声下的关系幻觉

    Vision-language models (VLMs) achieve strong multimodal performance but remain prone to relation hallucination, which requires accurate reasoning over inter-object interactions. We study the impact of visual perturbations, specifically rotation and noise, and show that even mild …

  129. arXiv cs.CV TIER_1 English(EN) · Itai Allouche, Joseph Keshet ·

    通过推理时相关性传播减轻多模态大语言模型的幻觉

    arXiv:2605.01766v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) have revolutionized the landscape of AI, demonstrating impressive capabilities in tackling complex vision and audio-language tasks. However, a critical challenge remains: these models often…

  130. arXiv cs.CV TIER_1 English(EN) · Jianfei Zhao, Feng Zhang, Xin Sun, Chong Feng, Zhixing Tan ·

    指示模型查看何处:通过视觉引导注意力减轻 MLLM 中的幻觉

    arXiv:2511.20032v3 Announce Type: replace Abstract: Visual attention serves as the primary mechanism through which MLLMs interpret visual information; however, its limited localization capability often leads to hallucinations. We observe that although MLLMs can accurately extract…

  131. arXiv cs.CV TIER_1 English(EN) · Chengsheng Zhang, Chenghao Sun, Xinyan Jiang, Wei Li, Xinmei Tian ·

    大型视觉-语言模型预填充时干预以减轻幻觉

    arXiv:2604.25642v1 Announce Type: new Abstract: Large Vision-Language Models (LVLMs) have achieved remarkable progress in visual-textual understanding, yet their reliability is critically undermined by hallucinations, i.e., the generation of factually incorrect or inconsistent re…

  132. arXiv cs.CV TIER_1 English(EN) · Xinmei Tian ·

    大型视觉语言模型预填充时干预以减轻幻觉

    Large Vision-Language Models (LVLMs) have achieved remarkable progress in visual-textual understanding, yet their reliability is critically undermined by hallucinations, i.e., the generation of factually incorrect or inconsistent responses. While recent studies using steering vec…

  133. arXiv cs.CV TIER_1 English(EN) · Jiawei Chen, Dingkang Yang, Tong Wu, Yue Jiang, Xiaolu Hou, Mingcheng Li, Shunli Wang, Dongling Xiao, Ke Li, Lihua Zhang ·

    检测和评估大型视觉语言模型中的医学幻觉

    arXiv:2406.10185v2 Announce Type: replace Abstract: Large Vision Language Models (LVLMs) are increasingly integral to healthcare applications, including medical visual question answering and imaging report generation. While these models inherit the robust capabilities of foundati…

  134. arXiv cs.CV TIER_1 English(EN) · Zhiyuan Jiang, Weihao Hong, Xinlei Guan, Tejaswi Dhandu, Miles Q. Li, Meng Xu, Kuan Huang, Umamaheswara Rao Tida, Bingyu Shen, Daehan Kwak, Boyang Li ·

    LLM-as-Judge 框架用于评估视觉语言模型中由语气引起的幻觉

    arXiv:2604.18803v3 Announce Type: replace Abstract: Vision-Language Models (VLMs) are increasingly deployed in settings where reliable visual grounding carries operational consequences, yet their behavior under progressively coercive prompt phrasing remains undercharacterized. Ex…

  135. arXiv cs.CV TIER_1 English(EN) · Yubo Jiang, Xin Yang, Abudukelimu Wuerkaixi, Zheming Yuan, Xuxin Cheng, Fengying Xie, Zhiguo Jiang, Cao Liu, Ke Zeng, Haopeng Zhang ·

    全局上下文还是局部细节?用于幻觉缓解的自适应视觉基础

    arXiv:2604.24396v1 Announce Type: new Abstract: Vision-Language Models (VLMs) are frequently undermined by object hallucination--generating content that contradicts visual reality--due to an over-reliance on linguistic priors. We introduce Positive-and-Negative Decoding (PND), a …

  136. arXiv cs.CV TIER_1 English(EN) · JiYang Wang, Jiawei Chen, Mengqi Xiao, Yu Cheng, Yangfu Li, Zhaoxia Yin ·

    DO-Bench:用于诊断视觉语言模型对象幻觉的可归因基准

    arXiv:2604.22822v1 Announce Type: new Abstract: Object level hallucination remains a central reliability challenge for vision language models (VLMs), particularly in binary object existence verification. Existing benchmarks emphasize aggregate accuracy but rarely disentangle whet…

  137. arXiv cs.CV TIER_1 English(EN) · Haopeng Zhang ·

    全局上下文还是局部细节?用于幻觉缓解的自适应视觉基础

    Vision-Language Models (VLMs) are frequently undermined by object hallucination--generating content that contradicts visual reality--due to an over-reliance on linguistic priors. We introduce Positive-and-Negative Decoding (PND), a training-free inference framework that intervene…

  138. Eugene Yan TIER_1 English(EN) ·

    域外微调以引导幻觉检测

    How to use open-source, permissive-use data and collect less labeled samples for our tasks.

  139. Medium — MLOps tag TIER_1 English(EN) · Anil Nayak ·

    超越ChatGPT包装器:构建成本优化、零幻觉AI平台

    <div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@nayakanil43603/beyond-chatgpt-wrappers-architecting-a-cost-optimized-zero-hallucination-ai-platform-c3a51fcc639c?source=rss------mlops-5"><img src="https://cdn-images-1.medium.com/max/1672/1*R…

  140. Medium — Claude tag TIER_1 English(EN) · Akashshettyonline ·

    1/10 减少 LLM 应用幻觉的方法:使用 RAG 进行接地

    <div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@akashshettyonline22/1-10-ways-to-reduce-hallucinations-in-llm-applications-grounding-with-rag-051102434e6f?source=rss------claude-5"><img src="https://cdn-images-1.medium.com/max/1402/1*-HGVib…

  141. Towards AI TIER_1 English(EN) · Faheem Ahmed ·

    当AI欺骗你时:幻觉、数据质量与老派的解决方案

    <figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*g6-CMPTHlab61BBb" /><figcaption>Photo by <a href="https://unsplash.com/@galka_nz?utm_source=medium&amp;utm_medium=referral">Galina Nelyubova</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=r…

  142. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    “幻觉是空间最优化的结果:成员资格测试的率失真定理” 该论文指出,幻觉是不可避免的后果

    "Hallucination is a Consequence of Space-Optimality: A Rate-Distortion Theorem for Membership Testing" This paper says that hallucinations are an inevitable consequence of the way that information is compressed in a lossy way to be stored in LLMs by comparing this to the math of …

  143. dev.to — LLM tag TIER_1 English(EN) · Muhammad Muzammil ·

    LongTracer:无需 LLM-as-a-Judge 的开源 RAG 幻觉检测

    <p>Stop paying to evaluate your LLM outputs. Stop tolerating non-deterministic quality gates. LongTracer is the MIT-licensed Python library that catches RAG hallucinations at inference time — no API calls, no cloud dependency, no per-verification cost.</p> <p><strong>The Hallucin…

  144. dev.to — LLM tag TIER_1 English(EN) · AlterLab ·

    优化分块和数据提取以实现零幻觉RAG

    <h2> TL;DR </h2> <p>To achieve near-zero hallucination in RAG pipelines, you must extract web content as structured Markdown or JSON rather than raw HTML, and apply DOM-aware semantic chunking. This preserves contextual boundaries and prevents irrelevant boilerplate or bot-challe…

  145. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    新方法论论文:计算器学科。AI辅助披露幻觉的四类分类法,一种预发送过滤器,可捕获机械性问题

    New methodology paper: The Calculator Discipline. A four-class taxonomy of AI-assisted disclosure hallucinations, a pre-send filter that catches the mechanical ones, and two real withdrawals from my own OpenBSD work — including the one Theo de Raadt asked the right question about…

  146. dev.to — LLM tag TIER_1 English(EN) · Spicy ·

    人工智能为何会“胡说八道”?开发者应对幻觉的指南

    <p>Quick version: LLMs don't look things up. They predict probable token sequences. When the model's training data is thin or absent on a topic, it doesn't stop — it keeps predicting. Fluently. Confidently. Incorrectly.</p> <p>If you've been building with LLMs for more than a few…

  147. dev.to — LLM tag TIER_1 English(EN) · Gabriel Anhaia ·

    Trace Layer 的幻觉检测:您今天就可以部署的 4 种检测器

    <ul> <li> <strong>Book:</strong> <a href="https://www.amazon.com/dp/B0GYLHMLMT" rel="noopener noreferrer">LLM Observability Pocket Guide: Picking the Right Tracing &amp; Evals Tools for Your Team</a> </li> <li> <strong>Also by me:</strong> <em>Thinking in Go</em> (2-book series) …

  148. dev.to — LLM tag TIER_1 English(EN) · CapeStart ·

    防止AI幻觉的指南

    <h2> What Are AI Hallucinations? </h2> <p>Last quarter, something happened that made us rethink our entire approach to AI deployment. During a routine audit, we found out our customer support AI had confidently recommended a non-existent product feature to an enterprise client. T…

  149. dev.to — LLM tag TIER_1 English(EN) · Mansi Somayajula ·

    生产环境机器学习系统教会我的关于AI幻觉的知识

    <p>Most discussions about AI hallucinations stay at the chatbot level.</p> <p>“ChatGPT made up a legal case.”<br /> “The AI invented a research paper.”<br /> “The model confidently gave the wrong answer.”</p> <p>Interesting? Sure.</p> <p>But after working on production ML systems…

  150. dev.to — LLM tag TIER_1 English(EN) · Thousand Miles AI ·

    破碎的镜子:为何AI幻觉是结构性的,而非bug

    <p>There is a particular kind of error a language model makes that feels different from every other kind of software failure. A database returns the wrong row and you can trace the query. A null pointer crashes and the stack tells you where. But when a model confidently cites a p…

  151. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    LLM幻觉的真实案例:Zhenyue Zhao, Yihe Wang, Toby Stuart, Mathijs De Vaan, Paul Ginsparg, Yian Yin 从不存在的引文获得的大规模证据

    LLM hallucinations in the wild: Large-scale evidence from non-existent citations Zhenyue Zhao, Yihe Wang, Toby Stuart, Mathijs De Vaan, Paul Ginsparg, Yian Yin "Large language models (LLMs) are known to generate plausible but false information across a wide range of contexts, yet…

  152. dev.to — LLM tag TIER_1 Português(PT) · Marcelo Cabral Ghilardi ·

    当AI撒谎并让你冷静下来时:关于幻觉的坦诚对话

    <p> </p> <p>E aí, gurizada! Tudo tranquilo? Hoje eu quero trocar uma ideia com vocês sobre umas paradas que andei percebendo com as IAs, e que me motivaram a gravar um vídeo e até escrever um post lá no meu site, o marcelocabral.com.br. Sabe quando a inteligência artificial solta…

  153. r/Anthropic TIER_1 English(EN) · /u/RouXanthica ·

    使用神经符号AI修复了病毒式传播的Opus 4.7幻觉/推理错误

    &#32; submitted by &#32; <a href="https://www.reddit.com/user/RouXanthica"> /u/RouXanthica </a> <br /> <span><a href="https://www.reddit.com/gallery/1ti228y">[link]</a></span> &#32; <span><a href="https://www.reddit.com/r/Anthropic/comments/1ti23mi/fixed_the_viral_opus_47_halluci…