PulseAugur
LIVE 09:32:13
research · [5 sources] ·
0
research

New AI models and datasets advance vision-language capabilities in ophthalmology and surgery

Researchers have introduced PubMed-Ophtha, a new dataset comprising over 100,000 ophthalmology image-caption pairs extracted from scientific literature. This dataset aims to address the scarcity of high-quality data for training vision-language models in the medical field. The extraction process involves detailed decomposition of figures from PDFs, annotation of imaging modalities, and sophisticated LLM-based caption splitting, achieving high accuracy in detection and extraction tasks. AI

Summary written by gemini-2.5-flash-lite from 5 sources. How we write summaries →

IMPACT Provides a large-scale, annotated dataset to accelerate the development of specialized vision-language models in ophthalmology.

RANK_REASON The cluster describes the release of a new dataset and associated models for research purposes.

Read on arXiv cs.CV →

COVERAGE [5]

  1. arXiv cs.CV TIER_1 · Abdelrahman Zaian, Sheethal Bhat, Mohamed Abdalkader, Andreas Maier ·

    Retina-RAG: Retrieval-Augmented Vision-Language Modeling for Joint Retinal Diagnosis and Clinical Report Generation

    arXiv:2605.06173v1 Announce Type: new Abstract: Diabetic Retinopathy (DR) is a leading cause of preventable blindness among working-age adults worldwide, yet most automated screening systems are limited to image-level classification and lack clinically structured reporting. We pr…

  2. arXiv cs.CV TIER_1 · Andreas Maier ·

    Retina-RAG: Retrieval-Augmented Vision-Language Modeling for Joint Retinal Diagnosis and Clinical Report Generation

    Diabetic Retinopathy (DR) is a leading cause of preventable blindness among working-age adults worldwide, yet most automated screening systems are limited to image-level classification and lack clinically structured reporting. We propose Retina-RAG, a low-cost modular framework t…

  3. arXiv cs.CV TIER_1 · Verena Jasmin Hallitschke, Carsten Eickhoff, Philipp Berens ·

    PubMed-Ophtha: An open resource for training ophthalmology vision-language models on scientific literature

    arXiv:2605.02720v1 Announce Type: new Abstract: Vision-language models hold considerable promise for ophthalmology, but their development depends on large-scale, high-quality image-text datasets that remain scarce. We present PubMed-Ophtha, a hierarchical dataset of 102,023 ophth…

  4. arXiv cs.CV TIER_1 · Chengan Che, Chao Wang, Jiayuan Huang, Xinyue Chen, Luis C. Garcia-Peraza-Herrera ·

    Can LLM-Generated Text Empower Surgical Vision-Language Pre-training?

    arXiv:2604.18134v2 Announce Type: replace Abstract: Recent advancements in self-supervised learning have led to powerful surgical vision encoders capable of spatiotemporal understanding. However, extending these visual foundations to multi-modal reasoning tasks is severely bottle…

  5. arXiv cs.CV TIER_1 · Philipp Berens ·

    PubMed-Ophtha: An open resource for training ophthalmology vision-language models on scientific literature

    Vision-language models hold considerable promise for ophthalmology, but their development depends on large-scale, high-quality image-text datasets that remain scarce. We present PubMed-Ophtha, a hierarchical dataset of 102,023 ophthalmological image-caption pairs extracted from 1…