New dataset enhances AI understanding of surgical videos

By PulseAugur Editorial · [2 sources] · 2026-06-24 14:53

Researchers have developed the SurgSTU-Pipeline to create a new dataset, SurgSTU, for fine-grained spatial-temporal understanding in surgical videos. This pipeline addresses the limitations of existing datasets and the challenges of manual annotation or LLM-generated data. The SurgSTU dataset includes over 6,700 video clips with 150,000 question-answer pairs, demonstrating that while generalist vision-language models struggle initially, they can be improved through in-context learning and fine-tuning on this specialized dataset. AI

IMPACT This specialized dataset and pipeline could significantly improve the accuracy and capabilities of AI systems in analyzing surgical procedures, potentially leading to better computer-assisted surgery tools.

RANK_REASON The cluster contains an academic paper detailing a new method and dataset for AI research. [lever_c_demoted from research: ic=1 ai=1.0]

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New dataset enhances AI understanding of surgical videos

COVERAGE [2]

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-24 14:53

SurgAtlas: A Large-Scale Surgical Video-Language Dataset with 2,391 Hours of Open and Minimally Invasive Surgery

We introduce SurgAtlas, the largest surgical video-language dataset to date, comprising 15,291 videos (2,391 hours) spanning 18 surgical specialties and over 5,000 procedure types, sourced entirely from publicly available YouTube content. SurgAtlas is also the first surgical vide…
arXiv cs.CV TIER_1 English(EN) · Lennart Maack, Alexander Schlaefer · 2026-06-29 04:00

An Approach to Enriching Surgical Video Datasets for Fine-Grained Spatial-Temporal Understanding of Vision-Language Models

arXiv:2604.00784v2 Announce Type: replace Abstract: Surgical video understanding is a crucial prerequisite for advancing Computer-Assisted Surgery. While vision-language models (VLMs) have recently been applied to the surgical domain, existing surgical vision-language datasets la…

COVERAGE [2]

SurgAtlas: A Large-Scale Surgical Video-Language Dataset with 2,391 Hours of Open and Minimally Invasive Surgery

An Approach to Enriching Surgical Video Datasets for Fine-Grained Spatial-Temporal Understanding of Vision-Language Models

RELATED ENTITIES

RELATED TOPICS