Brief · PulseAugur

TOOL · X — Perplexity English(EN) · 2w · [4 sources]

We're open-sourcing the Unigram tokenizer we rebuilt to reduce CPU utilization by 5-6x.

Perplexity AI has open-sourced a new Unigram tokenizer designed to significantly improve CPU performance. This new tokenizer achieves a 5x reduction in latency compared to HuggingFace's implementation and a 2x reduction compared to SentencePiece C++. The optimized tokenizer targets large vocabularies, such as XLM-RoBERTa's 250K-token Unigram vocabulary, which is commonly used in ranking and retrieval tasks. AI

IMPACT Accelerates inference for AI models by reducing tokenization latency on CPUs.

Perplexity AI
XLM-RoBERTa
HuggingFace tokenizers
Unigram tokenizer
SentencePiece C++