PulseAugur
实时 11:28:22
English(EN) SignNet-1M: Large-Scale Multilingual Sign Language Video Dataset with Downstream Benchmarks

新数据集和模型推动手语识别和翻译发展

研究人员开发了手语识别和翻译的新方法。一种方法使用深度学习流程,结合视频MAE视频Transformer将手语手势分类为英语单词,并使用Meta AI的NLLB-200模型将这些单词翻译成印地语、泰卢固语和孟加拉语等印度语言。另一项开发是SignNet-1M数据集,该数据集旨在通过使用3D高斯溅射和扩散模型等技术合成视角、背景和 the signer identity 的真实变化,来提高手语模型的鲁棒性。该数据集及其相关的基准测试旨在提高在现实条件下进行翻译和识别等任务的泛化能力。 AI

影响 手语识别和翻译模型的进步可以显著提高聋哑和听力障碍社区的可及性。

排序理由 该集群包含两篇研究论文,详细介绍了手语识别和翻译的新数据集和方法。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。 我们如何撰写摘要 →

新数据集和模型推动手语识别和翻译发展

报道来源 [4]

  1. arXiv cs.AI TIER_1 English(EN) · Ramesh Nandipalli ·

    基于深度学习的视频手语识别及跨语言翻译至印度本土语言

    Sign language is a primary mode of communication for the global deaf and hard-of-hearing community, yet automated tools that recognize sign gestures from video and translate them into natural language text remain limited, particularly for low-resource Indian languages. We present…

  2. arXiv cs.CV TIER_1 English(EN) · Jianhe Low, Alexandre Symeonidis-Herzig, Maksym Ivashechkin, Ozge Mercanoglu Sincan, Richard Bowden ·

    SignSparK: Efficient Multilingual Sign Language Production via Sparse Keyframe Learning

    arXiv:2603.10446v4 Announce Type: replace Abstract: Sign Language Production (SLP) faces a fundamental trade-off: direct text-to-pose models suffer from regression-to-the-mean effects, while dictionary-retrieval methods produce disjointed transitions. To resolve this, we propose …

  3. arXiv cs.CV TIER_1 English(EN) · Zhewen He, Junyi Hu, Haomian Huang, Zhenhua Li, Yu-Shen Liu, Yi Fang ·

    SignNet-1M: Large-Scale Multilingual Sign Language Video Dataset with Downstream Benchmarks

    arXiv:2606.24361v1 Announce Type: new Abstract: Sign language models are typically trained on datasets captured under constrained conditions, with limited viewpoint, background, and signer-identity diversity, leading to poor robustness under real-world distribution shifts. We int…

  4. arXiv cs.CV TIER_1 English(EN) · Yi Fang ·

    SignNet-1M: Large-Scale Multilingual Sign Language Video Dataset with Downstream Benchmarks

    Sign language models are typically trained on datasets captured under constrained conditions, with limited viewpoint, background, and signer-identity diversity, leading to poor robustness under real-world distribution shifts. We introduce SignNet-1M, a large-scale augmented datas…