PulseAugur / Brief
EN
LIVE 23:16:55

Brief

last 24h
[2/2] 222 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. Stage-adaptive Token Selection for Efficient Omni-modal LLMs

    Researchers have developed SEATS, a new method to make omni-modal large language models (om-LLMs) more efficient. SEATS prunes redundant audio-visual tokens throughout the model's layers, adapting the token selection process based on cross-modal fusion. This approach significantly reduces computational load and speeds up inference while maintaining high performance. AI

    IMPACT Reduces computational overhead and speeds up inference for multi-modal LLMs, potentially lowering deployment costs.

  2. OmniPro: A Comprehensive Benchmark for Omni-Proactive Streaming Video Understanding

    Researchers have introduced OmniPro and VideoOdyssey, two new benchmarks designed to evaluate the capabilities of omni-modal large language models in understanding long and complex video content. OmniPro focuses on proactive streaming video understanding, assessing a model's ability to decide when and what to say from audio-visual streams, and includes 2,700 human-verified samples across various tasks. VideoOdyssey targets ultra-long-context video understanding, featuring extremely long videos (average 109 minutes) and evaluating continuous reasoning and memory retention over extended periods. Both benchmarks highlight current limitations in models' long-horizon robustness, audio utilization, and fine-grained perception, particularly with non-speech audio. AI

    IMPACT These benchmarks will drive the development of AI models capable of understanding complex, long-form video content, crucial for applications like surveillance, content analysis, and autonomous systems.