Researchers have developed a new attention-based multimodal model designed to handle situations where some sensor data is missing during both training and inference. This model, formulated as a conditional variational autoencoder (CVAE) with a transformer backbone, learns a unified representation even with incomplete modalities. Experiments on five datasets across human trajectory prediction and robot manipulation forecasting show its effectiveness in learning from incomplete data and outperforming existing multimodal fusion methods. AI
IMPACT This model could improve the robustness of AI systems in real-world robotic applications where sensor data is often incomplete.
RANK_REASON This is a research paper describing a novel model for a specific machine learning problem. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →