Researchers have developed SIEVES, a novel method for improving the reliability of multimodal large language models (MLLMs) in out-of-distribution scenarios. SIEVES works by learning to estimate the quality of visual evidence provided by a reasoning model, enabling selective prediction. This approach significantly enhances model coverage, increasing it by up to three times on challenging benchmarks. Notably, SIEVES can be applied to proprietary models like Gemini-3-Pro without requiring access to their internal weights or logits. AI
影响 Enhances MLLM reliability in real-world scenarios by improving selective prediction and generalization to unseen data.
排序理由 Academic paper introducing a new method for multimodal LLM generalization.
- AdVQA
- Gemini-3-Pro
- Hector Garcia Rodriguez
- HR-Bench-8k
- MLLMs
- MME-RealWorld-Lite
- o3
- Pixel-Reasoner
- V* Bench
- VizWiz
AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →