CuriosAI Submission to the CASTLE Challenge at EgoVis 2026
CuriosAI has submitted a paper detailing their approach to the CASTLE Challenge, which involves answering multiple-choice questions based on extensive egocentric video data. Their primary method, SVA (Search-Verify-Answer), employs a three-stage pipeline that refines potential answers using a Vision-Language Model (VLM) and an LLM judge, achieving an accuracy of 0.50. A secondary approach, TMKG (Temporal-Multimodal-Knowledge-Graph), builds a knowledge graph from the video data but achieved a lower accuracy of 0.35. AI
IMPACT This research explores novel methods for video understanding and question answering, potentially advancing multimodal AI capabilities.