New research tackles LVLM efficiency and hallucination problems

By PulseAugur Editorial · [4 sources] · 2026-04-23 17:54

Two new research papers address efficiency and hallucination issues in large vision-language models (LVLMs). One paper introduces LRCP, a training-free method that uses low-rank compressibility to prune visual tokens, significantly reducing computational cost while maintaining high performance. The other paper proposes HalluScope, a benchmark and fine-tuning framework (HalluVL-DPO) to combat prompt-induced hallucinations by reducing the models' reliance on textual priors and improving visual grounding. AI

IMPACT New methods for pruning visual tokens and reducing hallucinations could improve the efficiency and reliability of large vision-language models.

RANK_REASON Two distinct research papers published on arXiv and highlighted by Hugging Face, addressing core technical challenges in large vision-language models.

Read on Hugging Face Daily Papers →

paper
other

AI-generated summary · Google Gemini · from 4 sources. How we write summaries →

New research tackles LVLM efficiency and hallucination problems

COVERAGE [4]

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-15 05:09

LRCP: Low-Rank Compressibility Guided Visual Token Pruning for Efficient LVLMs

Large vision-language models (LVLMs) achieve strong multimodal understanding, but their inference cost grows rapidly with the number of visual tokens, especially for high-resolution images and long videos. Existing attention-based methods estimate token importance from attention …
Hugging Face Daily Papers TIER_1 English(EN) · 2026-04-23 17:54

When Prompts Override Vision: Prompt-Induced Hallucinations in LVLMs

Despite impressive progress in capabilities of large vision-language models (LVLMs), these systems remain vulnerable to hallucinations, i.e., outputs that are not grounded in the visual input. Prior work has attributed hallucinations in LVLMs to factors such as limitations of the…
arXiv cs.CV TIER_1 English(EN) · Jiawei Li · 2026-05-15 05:09

LRCP: Low-Rank Compressibility Guided Visual Token Pruning for Efficient LVLMs

Large vision-language models (LVLMs) achieve strong multimodal understanding, but their inference cost grows rapidly with the number of visual tokens, especially for high-resolution images and long videos. Existing attention-based methods estimate token importance from attention …
arXiv cs.CV TIER_1 English(EN) · Matthieu Cord · 2026-04-23 17:54

When Prompts Override Vision: Prompt-Induced Hallucinations in LVLMs

Despite impressive progress in capabilities of large vision-language models (LVLMs), these systems remain vulnerable to hallucinations, i.e., outputs that are not grounded in the visual input. Prior work has attributed hallucinations in LVLMs to factors such as limitations of the…

COVERAGE [4]

LRCP: Low-Rank Compressibility Guided Visual Token Pruning for Efficient LVLMs

When Prompts Override Vision: Prompt-Induced Hallucinations in LVLMs

LRCP: Low-Rank Compressibility Guided Visual Token Pruning for Efficient LVLMs

When Prompts Override Vision: Prompt-Induced Hallucinations in LVLMs

RELATED ENTITIES

RELATED TOPICS