PulseAugur / Brief
EN
LIVE 20:49:57

Brief

last 24h
[3/3] 222 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. Raon-Speech Technical Report

    Researchers have introduced Raon-Speech, a 9-billion parameter speech language model capable of understanding, answering, and generating speech in English and Korean. This model, trained on over 1.38 million hours of curated speech and text data, outperforms similarly sized audio foundation models on speech-centric tasks while maintaining strong text-based question-answering abilities. An extension, Raon-SpeechChat, further enhances real-time, full-duplex conversation capabilities through additional training on dialogue data, demonstrating strengths in turn-taking and interruption sensitivity. AI

    IMPACT This new speech language model sets a new benchmark for speech understanding and generation, potentially improving human-computer interaction and real-time conversational AI.

  2. AffectVerse: Emotional World Models for Multimodal Affective Computing

    Researchers have introduced AffectVerse, a new multimodal model designed for affective computing that integrates temporal prediction into its reasoning process. Unlike previous models that treated emotion recognition statically, AffectVerse uses an Emotion World Module (EWM) to imagine and predict future affective states based on past multimodal cues. This predictive capability, achieved through cross-modal temporal imagination and belief aggregation, reportedly improves performance by at least 2.57% across nine benchmarks. AI

    AffectVerse: Emotional World Models for Multimodal Affective Computing

    IMPACT Introduces a novel approach to affective computing by incorporating temporal prediction, potentially improving how AI systems understand and respond to human emotions.

  3. Stage-adaptive Token Selection for Efficient Omni-modal LLMs

    Researchers have developed SEATS, a new method to make omni-modal large language models (om-LLMs) more efficient. SEATS prunes redundant audio-visual tokens throughout the model's layers, adapting the token selection process based on cross-modal fusion. This approach significantly reduces computational load and speeds up inference while maintaining high performance. AI

    IMPACT Reduces computational overhead and speeds up inference for multi-modal LLMs, potentially lowering deployment costs.