PulseAugur
实时 05:49:40
实体 speech recognition

speech recognition

PulseAugur coverage of speech recognition — every cluster mentioning speech recognition across labs, papers, and developer communities, ranked by signal.

Show in brief
总计 · 30天
14
90 天内 14
发布 · 30天
0
90 天内 0
论文 · 30天
11
90 天内 11
层级分布 · 90 天
情绪 · 30 天

4 天有情绪数据

最近 · 第 1/1 页 · 共 14 条
  1. TOOL · CL_47294 ·

    ASR 模型在现实世界中失效,需要模拟训练

    一项新研究强调,自动语音识别 (ASR) 模型在遇到真实世界音频数据时性能显著下降,这与其在受控环境中的成功形成鲜明对比。研究表明,这些模型难以应对自然语音中存在的复杂性和变化,导致准确率大幅下降。为解决此问题,该研究提出在大量模拟的、具有挑战性的音频场景数据集上训练 ASR 模型,以提高其在实际应用中的鲁棒性和可靠性。

  2. RESEARCH · CL_48761 ·

    AI安全研究论文呼吁增加防御激励

    arXiv上最近发表的一篇论文强调了AI安全研究中存在的显著不平衡,即对攻击方法的研究远超对防御策略的研究。研究表明,攻击性论文通常在夸大威胁严重性的条件下进行评估,而防御性研究则面临更严格的审查。这种差异导致该领域充斥着漏洞披露,但缺乏实用、可部署的保护措施,因此作者呼吁增加对防御性研究的激励。

  3. TOOL · CL_40816 ·

    LLMs show promise for low-resource ASR error correction

    Researchers explored the effectiveness of large language models (LLMs) in correcting errors for low-resource automatic speech recognition (ASR) systems, specifically focusing on West Frisian. Their study introduced a co…

  4. TOOL · CL_38314 ·

    New PAREDA dataset targets ASR improvements for accented speech

    Researchers have introduced PAREDA, a novel dataset designed to improve Automatic Speech Recognition (ASR) systems by capturing real-world speech variations. This dataset features discussions on Natural Language Process…

  5. TOOL · CL_38321 ·

    新的 SBPN 模型通过知识蒸馏提升尼日利亚语言 ASR 性能

    研究人员开发了一个名为 Sometin Beta Pass Notin (SBPN) 的新多语言自动语音识别 (ASR) 框架,以提高尼日利亚语言的性能。该框架采用两阶段知识蒸馏过程,首先从单一语言模型进行蒸馏,然后通过伪标记数据的迭代自我改进。该方法在 Common Voice 和 Fleurs 等基准测试中,相对于基线平均降低了 29% 的词错误率,并且优于现有的最先进的多语言模型。SBPN 以两种尺寸发布为开放基础模型,旨在为该…

  6. TOOL · CL_28317 ·

    New framework proposed for responsible ASR fairness benchmarking

    Researchers have proposed a new framework for evaluating fairness in automatic speech recognition (ASR) systems. The proposed methodology emphasizes the importance of clearly defining the fairness hypothesis and tailori…

  7. TOOL · CL_27534 ·

    LLMs assess psychological crisis levels using speech and paralinguistic cues

    Researchers have developed a new framework using large language models (LLMs) to automatically assess psychological crisis levels from speech. Their method incorporates paralinguistic emotional cues from speech into tex…

  8. TOOL · CL_23676 ·

    MLOps project case study details end-to-end speech recognition system development

    This case study details the development of an end-to-end speech recognition system, emphasizing the critical role of MLOps beyond just model performance. It highlights the necessity of a comprehensive approach to succes…

  9. TOOL · CL_25631 ·

    新方法实现跨领域可泛化的神经网络缩放定律

    研究人员开发了一种方法,可以创建可泛化应用于不同领域的神经网络缩放定律。这些定律预测了模型性能与数据或计算等资源之间的关系。新方法识别了关键的不变量,使得在一个领域拟合的缩放定律可以迁移到其他领域,即使在数据分辨率降低的转换下也是如此。这在语言、视觉和语音领域得到了验证,能够准确预测电子健康记录和嘈杂时间序列数据等专业应用。

  10. TOOL · CL_12782 ·

    Shotcut 26.4 视频编辑器为语音转文本添加了 Vulkan GPU 支持

    开源视频编辑器 Shotcut 发布了 26.4 版本,为 Linux 用户带来了重大增强。此更新为语音转文本功能带来了 Vulkan GPU 支持,可能提高性能和效率。此外,Shotcut 26.4 还包括用于带 E-AC-3 音频的 10 位 VP9 MP4 和带 Opus 音频的 10 位 VP9 WebM 的新导出预设。

  11. RESEARCH · CL_09814 ·

    Speech Representation Models outperform LLMs in pediatric speech disorder classification

    Researchers have developed a hierarchical approach using Speech Representation Models (SRMs) for classifying Speech Sound Disorders (SSD) in children, outperforming current Large Language Model (LLM) based methods. The …

  12. RESEARCH · CL_06649 ·

    New benchmark quantifies LLM API divergence across domains

    Researchers have developed a new framework to measure how much different large language models (LLMs) disagree when they try to find and rank external APIs for tasks. Across various API domains and major model families,…

  13. RESEARCH · CL_06335 ·

    Researchers introduce RAS, a new metric for reliable speech recognition systems

    Researchers have introduced RAS, a new metric designed to evaluate the reliability of automatic speech recognition (ASR) systems. Unlike traditional metrics that focus solely on accuracy, RAS accounts for the system's c…

  14. SIGNIFICANT · CL_44365 ·

    Together AI推出统一的实时语音代理平台

    Together AI推出了一个统一的平台,用于构建实时语音代理,将语音转文本(STT)、大型语言模型(LLM)和文本转语音(TTS)集成在单一云环境中。这种同地部署旨在将延迟降低到500毫秒以下,并通过消除跨供应商的网络跳转来简化部署。该平台现在原生支持Deepgram的STT和Cartesia Sonic-3的TTS等模型,为开发人员提供了更多选择和更简化的生产就绪语音应用体验。