English(EN)SuperMemory-VQA: An Egocentric Visual Question-Answering Benchmark for Long-Horizon Memory
新的VQA基准解决记忆、情感和可解释性问题
作者PulseAugur 编辑部·[8 个来源]·
研究人员正在开发用于高级视觉问答(VQA)任务的新基准和方法。一种方法侧重于从大型语言模型中提取答案集编程规则,以提高神经符号VQA系统的可解释性。另一项重要进展是SuperMemory-VQA数据集,它使用AI眼镜捕获长时程的以自我为中心的视频,用于评估AI助手在现实记忆回忆任务中的表现。此外,InsightVQA基准解决了视觉情感理解和认知推理问题,提供了一个用于这些复杂方面分层问答的大规模数据集。
AI
arXiv:2606.06534v1 Announce Type: cross Abstract: Longitudinal medical visual question answering (VQA) requires reasoning about anatomical differences between an image of a current time point and an image of a referred time point. We propose an attention-guided encoder-decoder fo…
arXiv cs.AI
TIER_1English(EN)·Thomas Eiter, Nelson Higuera Ruiz, Johannes Oetsch·
arXiv:2606.03269v1 Announce Type: new Abstract: Visual Question Answering (VQA) is the task of answering questions about images, requiring the integration of multimodal input and reasoning. Modular approaches that incorporate logic-based representations into the reasoning compone…
AI glasses present a compelling platform for AI agents to serve as personalized memory assistants. To be genuinely useful, such systems must move beyond short-term video comprehension and address memory gaps that humans experience for practical, personal, or social purposes over …
SuperMemory-VQA is introduced as an egocentric visual question answering dataset designed to evaluate AI assistants on long-term memory tasks through real-world activities recorded with AI glasses.
arXiv cs.CV
TIER_1English(EN)·I Putu Adi Pratama, Bahadorreza Ofoghi, Atul Sajjanhar, Shang Gao·
arXiv:2606.05535v1 Announce Type: new Abstract: Medical visual question answering (Med-VQA) has strong potential for clinical decision support by enabling AI models to interpret medical images and answer clinically relevant queries. Recent approaches typically connect off-the-she…
arXiv cs.CV
TIER_1English(EN)·Shiyu Wang (East China Normal University, Shanghai, China), Ziyu Liu (East China Normal University, Shanghai, China), Chaoyi Yu (East China Normal University, Shanghai, China), Yujie Yin (East China Normal University, Shanghai, China), Zhongqian Mao (Eas…·
arXiv:2606.02171v1 Announce Type: new Abstract: Visual emotion understanding requires models not only to recognize emotional states, but also to why they arise and perform higher-level cognitive reasoning. However, existing benchmarks mainly focus on emotion recognition, offering…
arXiv cs.CV
TIER_1English(EN)·Samiul Alam, Shakhrul Iman Siam, Michael J. Proulx, James Fort, Richard Newcombe, Hyo Jin Kim, Mi Zhang·
arXiv:2606.00825v1 Announce Type: new Abstract: AI glasses present a compelling platform for AI agents to serve as personalized memory assistants. To be genuinely useful, such systems must move beyond short-term video comprehension and address memory gaps that humans experience f…
Visual emotion understanding requires models not only to recognize emotional states, but also to why they arise and perform higher-level cognitive reasoning. However, existing benchmarks mainly focus on emotion recognition, offering limited support for grounded understanding and …