CuriosAI submits CASTLE Challenge paper with SVA and TMKG approaches

By PulseAugur Editorial · [1 sources] · 2026-05-28 04:00

CuriosAI has submitted a paper detailing their approach to the CASTLE Challenge, which involves answering multiple-choice questions based on extensive egocentric video data. Their primary method, SVA (Search-Verify-Answer), employs a three-stage pipeline that refines potential answers using a Vision-Language Model (VLM) and an LLM judge, achieving an accuracy of 0.50. A secondary approach, TMKG (Temporal-Multimodal-Knowledge-Graph), builds a knowledge graph from the video data but achieved a lower accuracy of 0.35. AI

IMPACT This research explores novel methods for video understanding and question answering, potentially advancing multimodal AI capabilities.

RANK_REASON The cluster contains a research paper submission for a challenge, detailing novel approaches. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Yuto Kanda, Hayato Tanoue, Takayuki Hori · 2026-05-28 04:00

CuriosAI Submission to the CASTLE Challenge at EgoVis 2026

arXiv:2605.27800v1 Announce Type: new Abstract: CASTLE 2026 asks 185 multiple-choice questions over 600+ hours of synchronized multi-view egocentric video. We explore two approaches on top of a shared multimodal preprocessing layer, including per-person timelines, speaker-resolve…

COVERAGE [1]

CuriosAI Submission to the CASTLE Challenge at EgoVis 2026

RELATED ENTITIES

RELATED TOPICS