New AI system OR3 improves operating room video retrieval via action-driven digital twins

By PulseAugur Editorial · [1 sources] · 2026-06-17 04:00

Researchers have developed OR3, a novel text-to-video retrieval system designed to enhance operating room safety by accurately identifying specific surgical events. Unlike previous methods that relied on global embeddings, OR3 converts video clips into action-driven digital twins (ActDTs), which group subject-action-object triplets within temporal intervals. This approach allows for imagination-based retrieval, where a large language model generates hypothetical ActDTs from queries, enabling more precise intra-modal matching. The system was tested on a benchmark of robotic knee procedures, demonstrating superior performance in fine-grained discrimination between visually similar clips. AI

IMPACT Enhances operating room safety by enabling precise retrieval of surgical events through advanced AI reasoning.

RANK_REASON The cluster contains an academic paper detailing a new AI method for video retrieval. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Yiqing Shen, Hao Ding, Mathias Unberath · 2026-06-17 04:00

Reasoning Text-to-Video Retrieval for Operating Room Clips via Action-Driven Digital Twins

arXiv:2606.17298v1 Announce Type: new Abstract: Text-to-video retrieval in operating rooms (OR) is an enabling technology for OR safety, as it allows stakeholders to retrieve and inspect recordings of specific events. However, because the most safety-critical events may not follo…

COVERAGE [1]

Reasoning Text-to-Video Retrieval for Operating Room Clips via Action-Driven Digital Twins

RELATED ENTITIES

RELATED TOPICS