Brief

last 24h

[6/6] 222 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.AI English(EN) · 1d

Adversarial Orthogonal Disentanglement for LVLM Hallucination Mitigation

Researchers have developed a new framework called Adversarial Orthogonal Disentanglement (AOD) to reduce hallucinations in Large Vision-Language Models (LVLMs). This method uses a minimax objective to isolate and remove hallucination-related signals from the model's internal representations. Experiments show AOD significantly improves accuracy on hallucination benchmarks while maintaining performance on general utility tasks, suggesting it captures broad biases rather than dataset-specific artifacts. AI

IMPACT Introduces a novel technique to improve the reliability of LVLMs by reducing factual inaccuracies in generated content.
RESEARCH · arXiv cs.AI English(EN) · 1w · [2 sources]

Rethinking Visual Attribution for Chest X-ray Reasoning in Large Vision Language Models

Researchers have developed a new framework to evaluate how well Large Vision Language Models (LVLMs) can ground their reasoning in visual evidence, particularly for chest X-ray analysis. Existing attribution methods often fail to accurately identify the visual cues that LVLMs use for their predictions, raising concerns about clinical trustworthiness. To address this, a new method called MedFocus was proposed, which significantly outperforms previous techniques in localizing clinically meaningful anatomical regions and measuring their causal impact on model outputs, aiming to improve the reliability of medical LVLMs. AI

IMPACT Enhances trustworthiness of medical AI by improving the explainability of LVLM decisions in clinical settings.
TOOL · arXiv cs.CV English(EN) · 5d

COCOTree: A Dataset and Benchmark for Open Tree-Structured Visual Decomposition

Researchers have introduced COCOTree, a new dataset and benchmark designed for the task of open tree-structured visual decomposition. This task involves segmenting images into hierarchical trees of visual components with flexible granularity. The dataset was generated using a novel pipeline that combines Large Vision-Language Models with SAM 3 for semantic reasoning and geometric grounding, resulting in over 2.1K images and 1.8M structural nodes with an open vocabulary of 3.5K labels. A new evaluation metric, Open Tree Quality (OTQ), has also been proposed to assess mask precision, label accuracy, and structural consistency. AI

IMPACT Enables new research in hierarchical image segmentation and visual decomposition tasks.
TOOL · arXiv cs.CV English(EN) · 1w

SetCon: Towards Open-Ended Referring Segmentation via Set-Level Concept Prediction

Researchers have introduced SetCon, a novel approach to open-ended referring segmentation that treats multiple targets as a coherent set rather than individual outputs. This method reformulates the problem as explicit set-level concept prediction, leveraging natural-language concepts generated by Large Vision Language Models (LVLMs). SetCon first predicts a broad set-level concept and then refines it into finer-grained groups, achieving state-of-the-art results on image and video benchmarks, particularly when dealing with an increasing number of referred targets. AI

IMPACT Improves segmentation accuracy for complex, multi-target scenarios, potentially enhancing AI's ability to understand and interact with visual scenes.
TOOL · arXiv cs.AI English(EN) · 1w

Breaking Modality Heterogeneity in Low-Bit Quantization for Large Vision-Language Models

Researchers have developed SplitQ, a new post-training quantization framework designed to improve the efficiency of large vision-language models (VLMs) on devices with limited resources. SplitQ addresses the accuracy degradation often seen in low-bit quantization by introducing a Modality-specific Outlier Channel Decoupling module to isolate modality-specific outliers and an Adaptive Cross-Modal Calibration module to correct remaining discrepancies. Experiments show SplitQ significantly outperforms existing methods across various quantization settings and datasets, preserving high performance even under challenging conditions. AI

IMPACT Enables more efficient deployment of advanced vision-language models on resource-constrained devices.
RESEARCH · arXiv cs.AI English(EN) · 4d · [2 sources]

CHASD: Language Increment-Calibrated Contrastive Decoding against Hallucination in LVLMs

Researchers have developed a new inference-time framework called CHASd to combat hallucinations in Large Vision-Language Models (LVLMs). This method, Contrastive Hallucination-Aware Step-wise Decoding, selectively activates a contrastive decoding branch only when token prediction confidence is low. It uses localized visual perturbations guided by attention to minimize interference with useful visual evidence, improving hallucination metrics on several benchmarks while maintaining efficient inference. AI

IMPACT Reduces object hallucinations in vision-language models, improving reliability for multimodal AI applications.

Brief

Adversarial Orthogonal Disentanglement for LVLM Hallucination Mitigation

Rethinking Visual Attribution for Chest X-ray Reasoning in Large Vision Language Models

COCOTree: A Dataset and Benchmark for Open Tree-Structured Visual Decomposition

SetCon: Towards Open-Ended Referring Segmentation via Set-Level Concept Prediction

Breaking Modality Heterogeneity in Low-Bit Quantization for Large Vision-Language Models

CHASD: Language Increment-Calibrated Contrastive Decoding against Hallucination in LVLMs