Brief

last 24h

[2/2] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.AI English(EN) · 8h

Inference-time Policy Steering via Vision and Touch

Researchers have developed ViTaL, a new framework for steering pre-trained generative robot policies during deployment. This system uses both visual and tactile information to refine candidate actions before execution, addressing limitations of vision-only methods in contact-rich manipulation tasks. ViTaL formulates multimodal guidance as a bi-level optimization problem, with visual sampling for long-horizon mode selection and tactile-guided diffusion editing for short-horizon refinement. The framework incorporates a visuo-tactile latent world model and learned verifiers, including a text-conditioned tactile reward, to improve success rates in real-world manipulation tasks. AI

IMPACT Enhances robot manipulation capabilities by integrating multimodal sensory feedback for improved action selection and refinement.
- ViTaL
- arXiv
- cs.AI
- cs.RO
- alphaXiv
- CatalyzeX
- DagsHub
- Gotit.pub
- Hugging Face
- ScienceCast
RESEARCH · arXiv cs.AI English(EN) · 2w · [2 sources]

VITAL: Visual-Semantic Dual Supervision for Enhanced and Interpretable Latent Reasoning in Medical MLLMs

Researchers have introduced VITAL, a novel framework designed to enhance latent reasoning in medical multimodal large language models (MLLMs). This approach addresses issues like modality collapse and lack of interpretability by employing a dual supervision strategy. VITAL uses an auxiliary text decoder and a visual projector, both of which can be detached during inference to maintain efficiency while allowing for post-hoc interpretability through textual and visual explanations. The framework has demonstrated state-of-the-art performance on various benchmarks, outperforming existing methods and even competing with trillion-parameter proprietary models. AI

IMPACT Enhances interpretability and performance of medical AI systems, potentially improving clinical decision-making.
- VITAL
- medical MLLMs

Brief

Inference-time Policy Steering via Vision and Touch

VITAL: Visual-Semantic Dual Supervision for Enhanced and Interpretable Latent Reasoning in Medical MLLMs