PulseAugur
LIVE 15:57:19
research · [6 sources] ·
0
research

TokenChain: A Discrete Speech Chain via Semantic Token Modeling

Researchers have developed a new method called Token-Aware Gradient Optimization (TAGO) to improve the efficiency of jailbreak attacks on audio language models (ALMs). TAGO identifies and utilizes only the most impactful audio token gradients, significantly reducing the computational effort required for these attacks. This approach maintains high success rates, demonstrating that dense waveform updates are largely unnecessary and suggesting future research should focus on this token-level gradient structure for audio safety alignment. AI

Summary written by None from 6 sources. How we write summaries →

IMPACT This research could lead to more efficient methods for testing and improving the safety of audio language models.

RANK_REASON Academic paper detailing a new method for attacking audio language models.

Read on arXiv cs.CL →

COVERAGE [6]

  1. arXiv cs.LG TIER_1 · Adhiraj Banerjee, Vipul Arora ·

    PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization

    arXiv:2605.06582v1 Announce Type: new Abstract: Many operations on sensory data -- comparison, memory, retrieval, and reasoning -- are naturally expressed over discrete symbolic structures. In language this interface is given by tokens; in audio, it must be learned. Existing audi…

  2. arXiv cs.CL TIER_1 · Vipul Arora ·

    PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization

    Many operations on sensory data -- comparison, memory, retrieval, and reasoning -- are naturally expressed over discrete symbolic structures. In language this interface is given by tokens; in audio, it must be learned. Existing audio tokenizers rely on quantization, clustering, o…

  3. arXiv cs.LG TIER_1 · Zheng Fang, Xiaosen Wang, Shenyi Zhang, Shaokang Wang, Zhijin Ge ·

    Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization

    arXiv:2605.04700v1 Announce Type: cross Abstract: Jailbreak attacks on audio language models (ALMs) optimize audio perturbations to elicit unsafe generations, and they typically update the entire waveform densely throughout optimization. In this work, we investigate the necessity…

  4. arXiv cs.CL TIER_1 · Zhijin Ge ·

    Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization

    Jailbreak attacks on audio language models (ALMs) optimize audio perturbations to elicit unsafe generations, and they typically update the entire waveform densely throughout optimization. In this work, we investigate the necessity of such dense optimization by analyzing the struc…

  5. arXiv cs.CL TIER_1 Dansk(DA) · Zhijie Huang, Stephen McIntosh, Daisuke Saito, Nobuaki Minematsu ·

    Kanade: A Simple Disentangled Tokenizer for Spoken Language Modeling

    arXiv:2602.00594v2 Announce Type: replace Abstract: A good language model starts with a good tokenizer. Tokenization is especially important for speech modeling, which must handle continuous signals that mix linguistic and non-linguistic information. A speech tokenizer should ext…

  6. arXiv cs.CL TIER_1 · Mingxuan Wang, Satoshi Nakamura ·

    TokenChain: A Discrete Speech Chain via Semantic Token Modeling

    arXiv:2510.06201v2 Announce Type: replace-cross Abstract: Machine Speech Chain, simulating the human perception-production loop, proves effective in jointly improving ASR and TTS. We propose TokenChain, a fully discrete speech chain coupling semantic-token ASR with a two-stage TT…