PulseAugur
实时 13:51:38
English(EN) Multi-modal Video Representation Alignment for Robust Self-supervised Driver Distraction Detection

新框架通过多模态视频对齐改进驾驶员分心检测

研究人员开发了一种新的多模态视频表示对齐框架,以改进用于驾驶员分心检测的自监督学习。该方法通过联合建模不可靠的阳性和阴性来解决来自多个传感器的噪声或故障数据带来的挑战。该方法使用软目标和基于相似度的加权机制来实现原则性的全局多模态对齐,在 Drive&Act 数据集上表现优于现有基线。 AI

影响 增强了人工智能系统在驾驶员安全等现实世界多模态视频理解任务中的鲁棒性。

排序理由 该集群包含一篇学术论文,详细介绍了一种用于计算机视觉自监督学习的新方法。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

报道来源 [2]

  1. arXiv cs.CV TIER_1 English(EN) · David J. Lerch, Livien Majer, Zeyun Zhong, Manuel Martin, Frederik Diederichs, Rainer Stiefelhagen ·

    Multi-modal Video Representation Alignment for Robust Self-supervised Driver Distraction Detection

    arXiv:2606.02352v1 Announce Type: new Abstract: Robust self-supervised learning of multi-modal video representations is critical for real-world applications such as driver distraction detection, where multiple sensors provide complementary but noisy signals. Conventional contrast…

  2. arXiv cs.CV TIER_1 English(EN) · Rainer Stiefelhagen ·

    Multi-modal Video Representation Alignment for Robust Self-supervised Driver Distraction Detection

    Robust self-supervised learning of multi-modal video representations is critical for real-world applications such as driver distraction detection, where multiple sensors provide complementary but noisy signals. Conventional contrastive objectives, such as InfoNCE, assume all nega…