New SEATS method slashes omni-modal LLM compute by 9.3x

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have developed SEATS, a novel method to improve the efficiency of omni-modal large language models (om-LLMs). These models process interleaved audio-visual tokens alongside text, but this dense input creates significant computational overhead. SEATS addresses this by adaptively selecting and pruning non-textual tokens across different stages of the LLM. Experiments show SEATS can reduce FLOPs by 9.3x and increase prefill speed by 4.8x while retaining over 96% of original performance by keeping only 10% of visual and audio tokens. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT This method could significantly reduce the computational cost of processing multi-modal data in LLMs, enabling wider adoption and faster inference for applications requiring audio-visual understanding.

RANK_REASON Publication of an academic paper detailing a new method for improving LLM efficiency. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

paper
infra

COVERAGE [1]

arXiv cs.CV TIER_1 · Xirong Li · 2026-05-19 15:55

Stage-adaptive Token Selection for Efficient Omni-modal LLMs

Omni-modal large language models (om-LLMs) achieve unified audio-visual understanding by encoding video and audio into temporally aligned token sequences interleaved at the window level. However, processing these dense non-textual tokens throughout the LLM incurs substantial comp…

COVERAGE [1]

Stage-adaptive Token Selection for Efficient Omni-modal LLMs

RELATED ENTITIES

RELATED TOPICS