New method generates synthetic patient data for scarce medical datasets

By PulseAugur Editorial · [1 sources] · 2026-06-24 12:45

Researchers have developed a novel patient augmentation technique for data-scarce Multiple Instance Learning (MIL) in medical applications. This method generates realistic patient data in the embedding space using Gaussian Mixture Models to learn disease-specific instance distributions. The approach can create new patients by remixing pooled embeddings, even without examples from all categories, and selects generated patients based on uncertainty quantification to enhance MIL performance. Experiments across various scarcity scenarios, including cross-dataset transfer and low-data regimes for single-cell RNA-seq and flow cytometry, show improved performance over existing methods, with one scenario achieving performance comparable to full-dataset training. AI

IMPACT This method could significantly improve diagnostic capabilities for rare diseases by enabling effective model training with limited data.

RANK_REASON Research paper published on arXiv detailing a new method for data augmentation in medical machine learning. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New method generates synthetic patient data for scarce medical datasets

COVERAGE [1]

arXiv cs.LG TIER_1 English(EN) · Ario Sadafi · 2026-06-24 12:45

Re-mixing Embeddings for Patient Augmentation in Data Scarce Multiple Instance Learning

Data scarcity is a major bottleneck in medical Multiple Instance Learning (MIL), especially for rare diseases or expensive modalities. We introduce a statistically grounded patient augmentation approach that generates realistic patients directly in embedding space. Using Gaussian…

COVERAGE [1]

Re-mixing Embeddings for Patient Augmentation in Data Scarce Multiple Instance Learning

RELATED ENTITIES

RELATED TOPICS