English(EN) quicktok: a faster tokenizer (exact and byte-identical with tiktoken) [P]

新的C++分词器'quicktok'比tiktoken快11倍

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-16 04:24

开发了一个名为quicktok的新C++分词器，与现有解决方案相比，它提供了显著的速度提升。它实现了与tiktoken字节相同的分词，并且速度明显更快，比bpe-openai快2-3.6倍，比tiktoken本身快4-11倍。该分词器支持cl100k、o200k、GPT-OSS、Llama-3和Qwen2.5/3等多种模型，利用数据结构工程来提高性能。 AI

影响加速分词工作流程，可能加快LLM推理和训练过程。

排序理由该集群描述了一个针对特定AI任务（分词）的新开源软件发布，并附有基准测试结果。[lever_c_demoted from research: ic=1 ai=1.0]

在 r/MachineLearning 阅读 →

基础设施

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/MachineLearning TIER_1 English(EN) · /u/_casa_nova_ · 2026-06-16 04:24

quicktok: a faster tokenizer (exact and byte-identical with tiktoken) [P]

<div class="md">Been working on this a while! Should be useful for anyone trying to speed up their tokenization workflows. quicktok is a fast/exact BPE tokenizer written in C++. Token ids are byte-identical to <code>tiktoken</code> and en…

报道来源 [1]

quicktok: a faster tokenizer (exact and byte-identical with tiktoken) [P]

相关实体

相关话题