PulseAugur
EN
LIVE 23:29:06

Local LLM users question token output differences between thinking and response

A user on the r/LocalLLaMA subreddit is inquiring about the discrepancy between the number of tokens generated by a local LLM for a final response versus its internal "thinking" process. They observed that the model's thought process, which includes processing input and generating intermediate text, appears to output significantly more tokens than the final categorized list they were trying to achieve. The user wonders if this "thinking" capability, which seems to be present in most models, can be leveraged for tasks like categorizing a large dataset without needing a specialized model or external tools like vector databases. AI

IMPACT N/A

RANK_REASON User question about LLM behavior, not a new release or significant event.

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 English(EN) · /u/iMakeSense ·

    Why does Thinking Output More Tokens Than a Response?

    <!-- SC_OFF --><div class="md"><p>I was too lazy to use a vector DB + Embedding + Clustering for this list of 1000 items I wanted to categorize. I was hoping to use a local LLM to do it, but it would only respond with a list of about 100 items or so and their categories. </p> <p>…