AI Research Tackles Hallucinations in Medical Imaging and Document Analysis

By PulseAugur Editorial · [29 sources] · 2026-06-04 22:19

Multiple research papers explore methods for detecting and mitigating hallucinations in AI systems, particularly in safety-critical applications like medical imaging and document analysis. One study proposes a cross-modality framework for medical AI, highlighting that general-purpose models can outperform specialized ones in hallucination benchmarks. Another paper introduces SafeLLM, which uses extraction rather than rewriting for retrieval-augmented generation to improve safety and reduce hallucinations. Additionally, research is being done on zero-source hallucination detection using human-like criteria probing and on utilizing optimal transport and causal recurrent labelers for quicker detection of hallucination onset in various AI tasks. AI

IMPACT Developments in hallucination detection and mitigation are crucial for the safe and reliable deployment of AI in critical domains like healthcare and compliance.

RANK_REASON Multiple research papers published on arXiv detailing novel methods for detecting and mitigating AI hallucinations.

Read on Hugging Face Daily Papers →

paper
safety

AI-generated summary · Google Gemini · from 29 sources. How we write summaries →

AI Research Tackles Hallucinations in Medical Imaging and Document Analysis

COVERAGE [29]

arXiv cs.AI TIER_1 English(EN) · Omar Alshahrani, Muzammil Behzad · 2026-06-12 04:00

Hallucination in Medical Imaging AI: A Cross-Modality Analytical Framework for Taxonomy, Detection, and Mitigation under Regulatory Constraints

arXiv:2606.13211v1 Announce Type: new Abstract: AI systems are being deployed across medical imaging faster than their failure modes are understood. At this point in time, the failure of greatest clinical concern is hallucination: clinically plausible but factually incorrect outp…
arXiv cs.CL TIER_1 English(EN) · Mariia Onyshchuk, Maksym-Vasyl Tarnavskyi, Marta Sumyk · 2026-06-12 04:00

Layer-Resolved Optimal Transport for Hallucination Detection in NMT and Abstractive Summarization

arXiv:2606.13216v1 Announce Type: new Abstract: Optimal transport (OT) has been shown to detect hallucinations in neural machine translation (NMT) by measuring the geometric distance between cross-attention distributions and a reference distribution, without any supervision. We e…
arXiv cs.CL TIER_1 English(EN) · Julia Ive, Felix Jozsa, Evridiki Georgaki, Nabeel Sheikh, Emma Cattell, Nick Jackson, Paulina Bondaronek, Ciaran Scott Hill, Richard Dobson · 2026-06-12 04:00

SafeLLM: Extraction as a Hallucination-Resistant Alternative to Rewriting in Safety-Critical Settings

arXiv:2606.12897v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used to access organisational documentation, including standard operating procedures (SOPs), HR policies and institutional guidelines. However, retrieval-augmented generation (RAG) syste…
arXiv cs.AI TIER_1 English(EN) · Igor Itkin · 2026-06-12 04:00

Quickest Detection of Hallucination Onset: Delay Bounds and Learned CUSUM Statistics

arXiv:2606.12476v1 Announce Type: cross Abstract: Token-level hallucination detectors are evaluated as classifiers, by AUC over all tokens, yet a streaming monitor is judged by its reaction time: the number of tokens that pass between the onset of a hallucination and the alarm. W…
arXiv cs.AI TIER_1 English(EN) · Jiahao Yang, Shuhai Zhang, Hailong Kang, Feng Liu, Qi Chen, Mingkui Tan · 2026-06-12 04:00

Zero-source LLM Hallucination Detection with Human-like Criteria Probing

arXiv:2606.12900v1 Announce Type: new Abstract: Large language models (LLMs) often hallucinate by generating factually incorrect or unfaithful content, posing significant risks to their safe use. Detecting such hallucinations is particularly challenging under the zero-source cons…
arXiv cs.CL TIER_1 English(EN) · Marta Sumyk · 2026-06-11 11:30

Layer-Resolved Optimal Transport for Hallucination Detection in NMT and Abstractive Summarization

Optimal transport (OT) has been shown to detect hallucinations in neural machine translation (NMT) by measuring the geometric distance between cross-attention distributions and a reference distribution, without any supervision. We extend this analysis to all six decoder layers of…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-11 11:30

Layer-Resolved Optimal Transport for Hallucination Detection in NMT and Abstractive Summarization

Optimal transport (OT) has been shown to detect hallucinations in neural machine translation (NMT) by measuring the geometric distance between cross-attention distributions and a reference distribution, without any supervision. We extend this analysis to all six decoder layers of…
arXiv cs.AI TIER_1 English(EN) · Muzammil Behzad · 2026-06-11 11:19

Hallucination in Medical Imaging AI: A Cross-Modality Analytical Framework for Taxonomy, Detection, and Mitigation under Regulatory Constraints

AI systems are being deployed across medical imaging faster than their failure modes are understood. At this point in time, the failure of greatest clinical concern is hallucination: clinically plausible but factually incorrect outputs, including fabricated anatomical structures,…
arXiv cs.CL TIER_1 English(EN) · Mingkui Tan · 2026-06-11 04:58

Zero-source LLM Hallucination Detection with Human-like Criteria Probing

Large language models (LLMs) often hallucinate by generating factually incorrect or unfaithful content, posing significant risks to their safe use. Detecting such hallucinations is particularly challenging under the zero-source constraint, where no model internals or external ref…
arXiv cs.CL TIER_1 English(EN) · Richard Dobson · 2026-06-11 04:55

SafeLLM: Extraction as a Hallucination-Resistant Alternative to Rewriting in Safety-Critical Settings

Large language models (LLMs) are increasingly used to access organisational documentation, including standard operating procedures (SOPs), HR policies and institutional guidelines. However, retrieval-augmented generation (RAG) systems that rely on free-form rewriting can introduc…
arXiv cs.AI TIER_1 English(EN) · Md. Rejaul Korim Sadi, Toufiqur Rahman Tasin, Golam Mostofa Naeem · 2026-06-11 04:00

From Architecture to Output: Structural Origins of Hallucination in Large Language Models and the Amplifying Role of Data

arXiv:2606.07537v1 Announce Type: cross Abstract: Large language models hallucinate--producing fluent, confident, factually wrong outputs--with a consistency that persists across generations and scales. Existing taxonomies classify hallucination by output type, distinguishing int…
arXiv cs.AI TIER_1 English(EN) · Nina I. Shamsi · 2026-06-10 04:00

Density Ridge Selective Prediction for LLM and VLM Hallucination Detection under Calibration Label Scarcity

arXiv:2606.10198v1 Announce Type: cross Abstract: Hallucination detection in large language and vision-language models is increasingly framed as selective prediction, where a detector assigns a confidence score and abstains when confidence is low. Unsupervised sampling detectors …
arXiv cs.LG TIER_1 English(EN) · Ruipeng Zhang, Zhihao Li, C. L. Philip Chen, Tong Zhang · 2026-06-09 04:00

Steer Where It Matters: Token-Level Visual-Sensitivity Steering for LVLMs Hallucination Mitigation

arXiv:2606.07647v1 Announce Type: cross Abstract: Large vision language models (LVLMs) have made rapid advancements and are deployed across various applications, yet hallucinations remain a major challenge. Activation steering is appealing due to its minimal training overhead and…
arXiv cs.LG TIER_1 English(EN) · Kostas Triaridis, Alexandros Graikos, Aggelina Chatziagapi, Grigorios G. Chrysos, Dimitris Samaras · 2026-06-09 04:00

Mitigating Diffusion Model Hallucinations with Dynamic Guidance

arXiv:2510.05356v2 Announce Type: replace-cross Abstract: Hallucinations in diffusion models are samples with structural inconsistencies that can emerge due to the excessive smoothing of the learned score function, which in turn leads to interpolations between modes of the data d…
arXiv cs.AI TIER_1 English(EN) · Abhivansh Gupta, Simardeep Singh, Advika Sinha, Shreyansh Modi, Akshat Tomar · 2026-06-09 04:00

How Many Counterfactuals Does It Take? Probing VLM Hallucinations Through Circuits and Causal Effects

arXiv:2606.08777v1 Announce Type: cross Abstract: Visual Language Models (VLMs) are known to produce hallucinated predictions that are not grounded in visual evidence, yet existing approaches lack a principled understanding of how robust such predictions are under counterfactual …
arXiv cs.AI TIER_1 English(EN) · Naveen Bera, Pulijala Sai Nikhila, Kondaguduru Abhiram, Shaik Gayaz Ali, Shoaib Sadiq Salehmohamed, Shaik Mohammed Omar, Jinal Prashant Thakkar, Hansika Aredla, Shalmali Ayachit · 2026-06-09 04:00

BEACON: Behavioral Entropy Aggregation for Cross-Model Hallucination Detection in Large Language Models

arXiv:2606.07528v1 Announce Type: cross Abstract: Hallucination in large language models (LLMs), defined as the generation of factually incorrect or unsupported content, remains a critical barrier to reliable deployment. We present BEACON (Behavioral Entropy Aggregation for Cross…
arXiv cs.AI TIER_1 English(EN) · Sanchita Porwal, Sai Prasath S, Xingjian Bi, Madelyn Scandlen · 2026-06-09 04:00

Evaluating Hallucinations in Domain-Adapted Large Language Models

arXiv:2606.07521v1 Announce Type: cross Abstract: This study investigates the phenomenon of hallucinations in domain-adapted Large Language Models (LLMs), focusing on the fine-tuning of the Llama-2 model with the Lamini dataset. Hallucinations, or the generation of nonsensical or…
arXiv cs.AI TIER_1 English(EN) · Shanshan Lin, Dongsheng Hong, Sibo Ju, Chao Chen, Xi Zhang, Xiangwen Liao · 2026-06-09 04:00

Constrained Paraphrase Consistency for LLM Hallucination Detection

arXiv:2606.08158v1 Announce Type: cross Abstract: Large language models (LLMs) can generate factually inconsistent claims, motivating accurate and scalable hallucination detectors. Prior work largely enlarges training sets via synthesis or new annotations, introducing increasing …
arXiv cs.AI TIER_1 English(EN) · Xinyi Li, Zhen Fang, Yongxin Deng, Jinyuan Luo, Hongnan Ma, Changdae Oh, Zijing Shi, Shanshan Ye, Hanchen Wang, Shu-Lin Chen, Yadan Luo, Mengyue Yang, Sean Du, Sharon Li, Ling Chen · 2026-06-08 04:00

OpenHalDet: A Unified Benchmark for Hallucination Detection across Diverse Generation Scenarios

arXiv:2606.06959v1 Announce Type: cross Abstract: Hallucination detection is essential for the reliable deployment of large language models (LLMs). However, existing evaluations face two core challenges: inconsistent inference configuration and evaluation, and limited coverage of…
arXiv cs.AI TIER_1 English(EN) · Jianru Shen · 2026-06-08 04:00

Evidence Graph Consistency in Retrieval-Augmented Generation: A Model-Dependent Analysis of Hallucination Detection

arXiv:2606.06748v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) reduces but does not eliminate hallucination in large language models. Existing detection methods rely on flat similarity between generated answers and retrieved passages, ignoring structural r…
arXiv cs.AI TIER_1 English(EN) · Georgii Aparin, Vadim Popov, Tasnima Sadekova, Assel Yermekova · 2026-06-08 04:00

Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders

arXiv:2606.07473v1 Announce Type: cross Abstract: Whisper, a widely adopted ASR model, is known to suffer from hallucinations - coherent transcriptions generated for non-speech audio entirely disconnected from the input. We investigate whether hallucinations can be detected and m…
arXiv cs.CL TIER_1 English(EN) · Xiangwen Liao · 2026-06-06 13:14

Constrained Paraphrase Consistency for LLM Hallucination Detection

Large language models (LLMs) can generate factually inconsistent claims, motivating accurate and scalable hallucination detectors. Prior work largely enlarges training sets via synthesis or new annotations, introducing increasing cost and potential bias while underusing the consi…
arXiv cs.CL TIER_1 English(EN) · Xiangwen Liao · 2026-06-06 13:13

Cross Paraphrastic Invariance Learning for Hallucination Detection

Large language models (LLMs) frequently generate hallucinations, which are unsupported by a source document. To avoid costly LLM-as-evaluator pipelines and the heavy annotation demands of existing classifiers, we propose CPIL (Cross Paraphrastic Invariance Learning), a two-stage …
arXiv cs.AI TIER_1 English(EN) · Assel Yermekova · 2026-06-05 17:26

Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders

Whisper, a widely adopted ASR model, is known to suffer from hallucinations - coherent transcriptions generated for non-speech audio entirely disconnected from the input. We investigate whether hallucinations can be detected and mitigated through Whisper's internal representation…
arXiv cs.CL TIER_1 English(EN) · Ling Chen · 2026-06-05 06:38

OpenHalDet: A Unified Benchmark for Hallucination Detection across Diverse Generation Scenarios

Hallucination detection is essential for the reliable deployment of large language models (LLMs). However, existing evaluations face two core challenges: inconsistent inference configuration and evaluation, and limited coverage of downstream domains and tasks. Consequently, repor…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-05 00:00

Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders

Research demonstrates that hallucinations in Whisper ASR can be detected and reduced using internal representations from audio encoder activations and Sparse AutoEncoder latents, achieving significant hallucination rate reduction with minimal speech transcription degradation.
arXiv cs.CL TIER_1 English(EN) · Jianru Shen · 2026-06-04 22:19

Evidence Graph Consistency in Retrieval-Augmented Generation: A Model-Dependent Analysis of Hallucination Detection

Retrieval-Augmented Generation (RAG) reduces but does not eliminate hallucination in large language models. Existing detection methods rely on flat similarity between generated answers and retrieved passages, ignoring structural relationships among evidence pieces and answer clai…
Towards AI TIER_1 English(EN) · Mohd Faraz · 2026-06-11 18:01

Hallucination Is a Memory Problem: Why No Amount of RLHF Will Fix It

<h4>LLMs don’t hallucinate because they’re broken. They hallucinate because of how they store knowledge, and RLHF, RAG, and bigger context windows are all treating the wrong thing. Here’s what’s actually going on.</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1…
r/LocalLLaMA TIER_1 English(EN) · /u/Upset-Presentation28 · 2026-06-09 16:23

Our ICML paper on predictable hallucination (information-budget abstention gate), + ntkMirror: a training-free open-weight implementation we're releasing today

<div class="md"><p>Our paper, <em>Predictable Compression Failures: Order Sensitivity and Information Budgeting for Evidence-Grounded Binary Adjudication</em>, was accepted at ICML 2026. Paper: <a href="https://arxiv.org/abs/2509.11208">https://arxiv.org/abs/2509.1…

COVERAGE [29]

RELATED ENTITIES

RELATED TOPICS