ENTITY
o200k_base
o200k_base
PulseAugur coverage of o200k_base — every cluster mentioning o200k_base across labs, papers, and developer communities, ranked by signal.
Total · 30d
2
2 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
2
2 over 90d
TIER MIX · 90D
TOPICS
SENTIMENT · 30D
2 day(s) with sentiment data
RECENT · PAGE 1/1 · 2 TOTAL
-
African languages face significant tokenization penalty in frontier LLMs
A new research paper reveals a significant "African Language Tax" in frontier large language models, where tokenizers assign substantially more subword tokens to African languages compared to English. This results in hi…
-
New BrahmicTokenizer-131K improves Indic language tokenization efficiency
Researchers have developed BrahmicTokenizer-131K, a new tokenizer designed to improve efficiency for Indic languages while maintaining performance on English and code. This tokenizer achieves a 26.7% reduction in token …