Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 2d

Wavelet as Tokenizer: Preliminary Results on a Shared Wavelet Token Schema for Natural Signals

Researchers have proposed a novel approach using wavelets as a common tokenization method for audio, images, and video, moving away from modality-specific latent grids. Their preliminary model, featuring a Haar DWT/IDWT frontend and a shared coefficient-token layout, achieved notable PSNR scores on benchmark datasets for speech, images, and video. The study suggests that a unified wavelet token schema could be viable, with further experiments indicating that sparse training and energy selection methods offer efficient compression strategies. AI

IMPACT Proposes a unified tokenization approach for multi-modal AI, potentially simplifying model architectures and improving efficiency.

EuroSAT RGB
Haar DWT/IDWT
DAVIS 2017
Speech Commands