Brief

last 24h

[3/3] 222 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.AI English(EN) · 16h

Raon-Speech Technical Report

Researchers have introduced Raon-Speech, a 9-billion parameter speech language model capable of understanding, answering, and generating speech in English and Korean. This model, trained on over 1.38 million hours of curated speech and text data, outperforms similarly sized audio foundation models on speech-centric tasks while maintaining strong text-based question-answering abilities. An extension, Raon-SpeechChat, further enhances real-time, full-duplex conversation capabilities through additional training on dialogue data, demonstrating strengths in turn-taking and interruption sensitivity. AI

IMPACT This new speech language model sets a new benchmark for speech understanding and generation, potentially improving human-computer interaction and real-time conversational AI.
TOOL · arXiv cs.CV English(EN) · 1w

AffectVerse: Emotional World Models for Multimodal Affective Computing

Researchers have introduced AffectVerse, a new multimodal model designed for affective computing that integrates temporal prediction into its reasoning process. Unlike previous models that treated emotion recognition statically, AffectVerse uses an Emotion World Module (EWM) to imagine and predict future affective states based on past multimodal cues. This predictive capability, achieved through cross-modal temporal imagination and belief aggregation, reportedly improves performance by at least 2.57% across nine benchmarks. AI

IMPACT Introduces a novel approach to affective computing by incorporating temporal prediction, potentially improving how AI systems understand and respond to human emotions.
RESEARCH · Hugging Face Daily Papers English(EN) · 1w · [2 sources]

Stage-adaptive Token Selection for Efficient Omni-modal LLMs

Researchers have developed SEATS, a new method to make omni-modal large language models (om-LLMs) more efficient. SEATS prunes redundant audio-visual tokens throughout the model's layers, adapting the token selection process based on cross-modal fusion. This approach significantly reduces computational load and speeds up inference while maintaining high performance. AI

IMPACT Reduces computational overhead and speeds up inference for multi-modal LLMs, potentially lowering deployment costs.

Brief

Raon-Speech Technical Report

AffectVerse: Emotional World Models for Multimodal Affective Computing

Stage-adaptive Token Selection for Efficient Omni-modal LLMs