New AI models and datasets advance vision-language capabilities in ophthalmology and surgery

作者 PulseAugur 编辑部 · [5 个来源] · 2026-05-04 15:19

Researchers have introduced PubMed-Ophtha, a new dataset comprising over 100,000 ophthalmology image-caption pairs extracted from scientific literature. This dataset aims to address the scarcity of high-quality data for training vision-language models in the medical field. The extraction process involves detailed decomposition of figures from PDFs, annotation of imaging modalities, and sophisticated LLM-based caption splitting, achieving high accuracy in detection and extraction tasks. AI

影响 Provides a large-scale, annotated dataset to accelerate the development of specialized vision-language models in ophthalmology.

排序理由 The cluster describes the release of a new dataset and associated models for research purposes.

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 5 个来源。我们如何撰写摘要 →

报道来源 [5]

arXiv cs.CV TIER_1 English(EN) · Abdelrahman Zaian, Sheethal Bhat, Mohamed Abdalkader, Andreas Maier · 2026-05-08 04:00

Retina-RAG: Retrieval-Augmented Vision-Language Modeling for Joint Retinal Diagnosis and Clinical Report Generation

arXiv:2605.06173v1 Announce Type: new Abstract: Diabetic Retinopathy (DR) is a leading cause of preventable blindness among working-age adults worldwide, yet most automated screening systems are limited to image-level classification and lack clinically structured reporting. We pr…
arXiv cs.CV TIER_1 English(EN) · Andreas Maier · 2026-05-07 12:54

Retina-RAG: Retrieval-Augmented Vision-Language Modeling for Joint Retinal Diagnosis and Clinical Report Generation

Diabetic Retinopathy (DR) is a leading cause of preventable blindness among working-age adults worldwide, yet most automated screening systems are limited to image-level classification and lack clinically structured reporting. We propose Retina-RAG, a low-cost modular framework t…
arXiv cs.CV TIER_1 English(EN) · Verena Jasmin Hallitschke, Carsten Eickhoff, Philipp Berens · 2026-05-05 04:00

PubMed-Ophtha: An open resource for training ophthalmology vision-language models on scientific literature

arXiv:2605.02720v1 Announce Type: new Abstract: Vision-language models hold considerable promise for ophthalmology, but their development depends on large-scale, high-quality image-text datasets that remain scarce. We present PubMed-Ophtha, a hierarchical dataset of 102,023 ophth…
arXiv cs.CV TIER_1 English(EN) · Chengan Che, Chao Wang, Jiayuan Huang, Xinyue Chen, Luis C. Garcia-Peraza-Herrera · 2026-05-05 04:00

Can LLM-Generated Text Empower Surgical Vision-Language Pre-training?

arXiv:2604.18134v2 Announce Type: replace Abstract: Recent advancements in self-supervised learning have led to powerful surgical vision encoders capable of spatiotemporal understanding. However, extending these visual foundations to multi-modal reasoning tasks is severely bottle…
arXiv cs.CV TIER_1 English(EN) · Philipp Berens · 2026-05-04 15:19

PubMed-Ophtha: An open resource for training ophthalmology vision-language models on scientific literature

Vision-language models hold considerable promise for ophthalmology, but their development depends on large-scale, high-quality image-text datasets that remain scarce. We present PubMed-Ophtha, a hierarchical dataset of 102,023 ophthalmological image-caption pairs extracted from 1…

报道来源 [5]

Retina-RAG: Retrieval-Augmented Vision-Language Modeling for Joint Retinal Diagnosis and Clinical Report Generation

Retina-RAG: Retrieval-Augmented Vision-Language Modeling for Joint Retinal Diagnosis and Clinical Report Generation

PubMed-Ophtha: An open resource for training ophthalmology vision-language models on scientific literature

Can LLM-Generated Text Empower Surgical Vision-Language Pre-training?

PubMed-Ophtha: An open resource for training ophthalmology vision-language models on scientific literature

相关实体

相关话题