PulseAugur
EN
LIVE 11:31:48

New benchmark FutureOmni tests multimodal LLMs on future forecasting

Researchers have introduced FutureOmni, a new benchmark designed to evaluate the future forecasting capabilities of multimodal large language models (MLLMs). The benchmark focuses on audio-visual environments and requires models to perform cross-modal reasoning and leverage internal knowledge to predict future events. Current MLLMs struggle with this task, with the best-performing model, Gemini 3 Flash, achieving only 64.8% accuracy. To address this, the researchers developed an instruction-tuning dataset and an Omni-Modal Future Forecasting (OFF) training strategy, which improved future forecasting and generalization. AI

IMPACT This benchmark and training strategy could lead to more capable multimodal models that can better understand and predict future events from complex data.

RANK_REASON The cluster contains a research paper introducing a new benchmark and training strategy for multimodal LLMs. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. arXiv cs.CL TIER_1 Română(RO) · Qian Chen, Jinlan Fu, Changsong Li, Min Zhang, See-Kiong Ng, Xipeng Qiu ·

    FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs

    arXiv:2601.13836v2 Announce Type: replace Abstract: Although Multimodal Large Language Models (MLLMs) demonstrate strong omni-modal perception, their ability to forecast future events from audio-visual cues remains largely unexplored, as existing benchmarks focus mainly on retros…