New self-supervised audio model BEST-RQ-2 improves transfer learning

By PulseAugur Editorial · [1 sources] · 2026-07-01 04:00

Researchers have introduced BEST-RQ-2, an advancement in self-supervised audio representation learning. This new approach utilizes a two-step pretraining method, separating contextualization and prediction stages. By employing a ViT context encoder for unmasked regions and a lightweight predictor for masked areas, BEST-RQ-2 demonstrates improved performance on benchmarks like X-ARES and XARES-LLM compared to single-stage methods, while maintaining comparable inference compute. The model's code and checkpoints are publicly available. AI

IMPACT Introduces a novel approach to self-supervised audio learning, potentially improving performance on various audio tasks and benchmarks.

RANK_REASON The cluster contains a research paper detailing a new method for self-supervised audio representations. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New self-supervised audio model BEST-RQ-2 improves transfer learning

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Ludovic K. Tuncay (IRIT-SAMoVA), Etienne Labb\'e (IRIT-SAMoVA), Thomas Pellegrini (IRIT-SAMoVA) · 2026-07-01 04:00

BEST-RQ-2: Contextualize-Then-Predict, a Two-Step Approach for Self-Supervised Audio Representations

arXiv:2606.30700v1 Announce Type: cross Abstract: Self-supervised learning enables audio representations that transfer across domains and tasks. We present BEST-RQ-2, an evolution of BEST-RQ that retains frozen randomprojection-based discrete targets while introducing a two-step …

COVERAGE [1]

BEST-RQ-2: Contextualize-Then-Predict, a Two-Step Approach for Self-Supervised Audio Representations

RELATED ENTITIES

RELATED TOPICS