Fusion is not one-size-fits-all: Cross-Modal Representation Alignment for Time-to-Event Modeling
Researchers have developed a new framework for cross-modal representation alignment to improve time-to-event (TTE) prediction using both CT imaging and longitudinal electronic health records (EHR). This foundation model-driven approach addresses challenges like modality imbalance and distribution shift by aligning data in a shared latent space through various fusion strategies. The framework demonstrated consistent improvements in prediction accuracy across different TTE tasks, particularly for pulmonary embolism mortality, with contrastive multimodal fusion showing robust results. AI
IMPACT Task-aware multimodal alignment is established as a key principle for robust generalization in clinical TTE prediction.