English(EN) Why does Thinking Output More Tokens Than a Response?

本地LLM用户质疑思考与响应之间的Token输出差异

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-30 16:12

r/LocalLLaMA subreddit上的一位用户正在询问本地LLM为最终响应生成的Token数量与其内部“思考”过程之间存在的差异。他们观察到，模型的思考过程（包括处理输入和生成中间文本）似乎比他们试图实现的最终分类列表输出了更多的Token。用户想知道这种似乎存在于大多数模型中的“思考”能力是否可以用于诸如对大型数据集进行分类之类的任务，而无需专门的模型或外部工具（如向量数据库）。 AI

影响不适用

排序理由用户对LLM行为的提问，并非新发布或重大事件。

在 r/LocalLLaMA 阅读 →

其他

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/LocalLLaMA TIER_1 English(EN) · /u/iMakeSense · 2026-05-30 16:12

Why does Thinking Output More Tokens Than a Response?

<div class="md"><p>I was too lazy to use a vector DB + Embedding + Clustering for this list of 1000 items I wanted to categorize. I was hoping to use a local LLM to do it, but it would only respond with a list of about 100 items or so and their categories. </p> <p>…

报道来源 [1]

Why does Thinking Output More Tokens Than a Response?

相关实体

相关话题