Researchers have developed SEATS, a novel method to improve the efficiency of omni-modal large language models (om-LLMs). These models process interleaved audio-visual tokens alongside text, but this dense input creates significant computational overhead. SEATS addresses this by adaptively selecting and pruning non-textual tokens across different stages of the LLM. Experiments show SEATS can reduce FLOPs by 9.3x and increase prefill speed by 4.8x while retaining over 96% of original performance by keeping only 10% of visual and audio tokens. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT This method could significantly reduce the computational cost of processing multi-modal data in LLMs, enabling wider adoption and faster inference for applications requiring audio-visual understanding.
RANK_REASON Publication of an academic paper detailing a new method for improving LLM efficiency. [lever_c_demoted from research: ic=1 ai=1.0]