PulseAugur
实时 05:05:37
English(EN) Why Word Error Rate Is Not Enough: Semantic Decomposition of ASR Errors

研究发现,自动化的LLM越狱裁判缺乏可靠性

研究人员正在质疑用于评估大型语言模型(LLM)越狱的自动化评分系统的可靠性。一项新研究发现,专用分类器倾向于过度标记攻击,而基于LLM的裁判则表现出不一致的召回率,导致所使用的裁判不同,攻击成功率差异很大。这些自动化裁判也容易受到对抗性攻击,简单的文本操纵会显著改变其分数,而专用分类器则更具鲁棒性,但可能被白盒攻击所破坏。研究结果表明,许多报告的攻击成功率可能由于这些自动化评估方法的局限性而不可靠。 AI

影响 强调了在LLM安全研究中需要更强大、更可靠的评估指标,这可能会影响模型安全性的评估方式。

排序理由 该集群包含讨论评估LLM越狱和ASR错误自动化系统局限性和评估的研究论文。

在 Towards AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

研究发现,自动化的LLM越狱裁判缺乏可靠性

报道来源 [3]

  1. arXiv cs.CL TIER_1 English(EN) · Yang Gao ·

    How Reliable Is Your Jailbreak Judge? Calibration and Adversarial Robustness of Automated ASR Scoring

    Almost every paper on LLM jailbreaks and prompt injection reports an attack-success rate (ASR), and that number is assigned not by people but by an automated judge: either a safety classifier trained for the task, or a general chat model prompted to grade. The judge is rarely che…

  2. arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Mohammad Aref Jafari-Raddani ·

    Error-Aware TF-IDF Retrieval-Augmented Generation for ASR Error Correction

    End-to-end automatic speech recognition systems frequently hallucinate rare entities and domain-specific terms, especially in low-resource languages. While retrieval-augmented generation frameworks can mitigate these errors using large language models, current architectures face …

  3. Towards AI TIER_1 English(EN) · Dmitriy Nikultsev ·

    为什么词错误率不够用:ASR错误的语义分解

    <h4>A feasible framework for evaluating ASR models across semantic categories instead of a single aggregate metric</h4><figure><img alt="Introduction image showing decomposition of general WER into semantic categories, such as people, geography names, etc" src="https://cdn-images…