PulseAugur / Brief
EN
LIVE 11:56:20

Brief

last 24h
[1/1] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. MuVAP: Multimodal Multiparty Voice Activity Projection for Turn-taking Prediction in the Wild

    Researchers have introduced MuVAP, a novel multimodal framework designed for predicting turn-taking in multiparty conversations. This system extends Voice Activity Projection by integrating acoustic predictions with face tracking from a single camera and monaural audio stream, making it suitable for human-robot interaction. To handle the complexity of multiple speakers, MuVAP employs Role-Relative Projection. The framework is validated using the newly created Audio-Visual Conversation Corpus, a 31-hour dataset of unedited conversations, and demonstrates superior performance on turn-taking prediction tasks compared to existing baselines. AI

    IMPACT This framework could enhance human-robot interaction by enabling more natural turn-taking in conversations.