New AI models and datasets advance vision-language capabilities in ophthalmology and surgery

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 5 sources

Researchers have introduced PubMed-Ophtha, a new dataset comprising over 100,000 ophthalmology image-caption pairs extracted from scientific literature. This dataset aims to address the scarcity of high-quality data for training vision-language models in the medical field. The extraction process involves detailed decomposition of figures from PDFs, annotation of imaging modalities, and sophisticated LLM-based caption splitting, achieving high accuracy in detection and extraction tasks. AI

Summary written by gemini-2.5-flash-lite from 5 sources. How we write summaries →

IMPACT Provides a large-scale, annotated dataset to accelerate the development of specialized vision-language models in ophthalmology.

RANK_REASON The cluster describes the release of a new dataset and associated models for research purposes.

Read on arXiv cs.CV →

paper
other

COVERAGE [5]

arXiv cs.CV TIER_1 · Abdelrahman Zaian, Sheethal Bhat, Mohamed Abdalkader, Andreas Maier · 2026-05-08 04:00

Retina-RAG: Retrieval-Augmented Vision-Language Modeling for Joint Retinal Diagnosis and Clinical Report Generation

arXiv:2605.06173v1 Announce Type: new Abstract: Diabetic Retinopathy (DR) is a leading cause of preventable blindness among working-age adults worldwide, yet most automated screening systems are limited to image-level classification and lack clinically structured reporting. We pr…
arXiv cs.CV TIER_1 · Andreas Maier · 2026-05-07 12:54

Retina-RAG: Retrieval-Augmented Vision-Language Modeling for Joint Retinal Diagnosis and Clinical Report Generation

Diabetic Retinopathy (DR) is a leading cause of preventable blindness among working-age adults worldwide, yet most automated screening systems are limited to image-level classification and lack clinically structured reporting. We propose Retina-RAG, a low-cost modular framework t…
arXiv cs.CV TIER_1 · Verena Jasmin Hallitschke, Carsten Eickhoff, Philipp Berens · 2026-05-05 04:00

PubMed-Ophtha: An open resource for training ophthalmology vision-language models on scientific literature

arXiv:2605.02720v1 Announce Type: new Abstract: Vision-language models hold considerable promise for ophthalmology, but their development depends on large-scale, high-quality image-text datasets that remain scarce. We present PubMed-Ophtha, a hierarchical dataset of 102,023 ophth…
arXiv cs.CV TIER_1 · Chengan Che, Chao Wang, Jiayuan Huang, Xinyue Chen, Luis C. Garcia-Peraza-Herrera · 2026-05-05 04:00

Can LLM-Generated Text Empower Surgical Vision-Language Pre-training?

arXiv:2604.18134v2 Announce Type: replace Abstract: Recent advancements in self-supervised learning have led to powerful surgical vision encoders capable of spatiotemporal understanding. However, extending these visual foundations to multi-modal reasoning tasks is severely bottle…
arXiv cs.CV TIER_1 · Philipp Berens · 2026-05-04 15:19

PubMed-Ophtha: An open resource for training ophthalmology vision-language models on scientific literature

Vision-language models hold considerable promise for ophthalmology, but their development depends on large-scale, high-quality image-text datasets that remain scarce. We present PubMed-Ophtha, a hierarchical dataset of 102,023 ophthalmological image-caption pairs extracted from 1…

COVERAGE [5]

Retina-RAG: Retrieval-Augmented Vision-Language Modeling for Joint Retinal Diagnosis and Clinical Report Generation

Retina-RAG: Retrieval-Augmented Vision-Language Modeling for Joint Retinal Diagnosis and Clinical Report Generation

PubMed-Ophtha: An open resource for training ophthalmology vision-language models on scientific literature

Can LLM-Generated Text Empower Surgical Vision-Language Pre-training?

PubMed-Ophtha: An open resource for training ophthalmology vision-language models on scientific literature

RELATED ENTITIES

RELATED TOPICS