PulseAugur
实时 22:42:38

TokenChain: A Discrete Speech Chain via Semantic Token Modeling

研究人员开发了一种名为 Token-Aware Gradient Optimization (TAGO) 的新方法,以提高音频语言模型 (ALM) 越狱攻击的效率。TAGO 仅识别并利用最具影响力的音频 token 梯度,显著降低了这些攻击所需的计算量。该方法保持了高成功率,表明密集波形更新在很大程度上是不必要的,并建议未来的研究应侧重于音频安全对齐的 token 级梯度结构。 AI

影响 这项研究可能带来更有效的方法来测试和改进音频语言模型的安全性。

排序理由 学术论文,详细介绍了一种攻击音频语言模型的新方法。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 6 个来源。 我们如何撰写摘要 →

TokenChain: A Discrete Speech Chain via Semantic Token Modeling

报道来源 [6]

  1. arXiv cs.LG TIER_1 English(EN) · Adhiraj Banerjee, Vipul Arora ·

    PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization

    arXiv:2605.06582v1 Announce Type: new Abstract: Many operations on sensory data -- comparison, memory, retrieval, and reasoning -- are naturally expressed over discrete symbolic structures. In language this interface is given by tokens; in audio, it must be learned. Existing audi…

  2. arXiv cs.CL TIER_1 English(EN) · Vipul Arora ·

    PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization

    Many operations on sensory data -- comparison, memory, retrieval, and reasoning -- are naturally expressed over discrete symbolic structures. In language this interface is given by tokens; in audio, it must be learned. Existing audio tokenizers rely on quantization, clustering, o…

  3. arXiv cs.LG TIER_1 English(EN) · Zheng Fang, Xiaosen Wang, Shenyi Zhang, Shaokang Wang, Zhijin Ge ·

    Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization

    arXiv:2605.04700v1 Announce Type: cross Abstract: Jailbreak attacks on audio language models (ALMs) optimize audio perturbations to elicit unsafe generations, and they typically update the entire waveform densely throughout optimization. In this work, we investigate the necessity…

  4. arXiv cs.CL TIER_1 English(EN) · Zhijin Ge ·

    Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization

    Jailbreak attacks on audio language models (ALMs) optimize audio perturbations to elicit unsafe generations, and they typically update the entire waveform densely throughout optimization. In this work, we investigate the necessity of such dense optimization by analyzing the struc…

  5. arXiv cs.CL TIER_1 Dansk(DA) · Zhijie Huang, Stephen McIntosh, Daisuke Saito, Nobuaki Minematsu ·

    Kanade: A Simple Disentangled Tokenizer for Spoken Language Modeling

    arXiv:2602.00594v2 Announce Type: replace Abstract: A good language model starts with a good tokenizer. Tokenization is especially important for speech modeling, which must handle continuous signals that mix linguistic and non-linguistic information. A speech tokenizer should ext…

  6. arXiv cs.CL TIER_1 English(EN) · Mingxuan Wang, Satoshi Nakamura ·

    TokenChain: A Discrete Speech Chain via Semantic Token Modeling

    arXiv:2510.06201v2 Announce Type: replace-cross Abstract: Machine Speech Chain, simulating the human perception-production loop, proves effective in jointly improving ASR and TTS. We propose TokenChain, a fully discrete speech chain coupling semantic-token ASR with a two-stage TT…