English(EN) max_pixels is a token budget in disguise — and the right cap depends on the size of what you're looking for

Qwen2.5-VL 图像 token 预算影响准确性

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-11 07:38

Qwen2.5-VL 模型中的 `max_pixels` 配置实际上是伪装的 token 预算，默认设置通常会导致远高于推荐的预算。这可能导致性能不佳，尤其是在图像中的大型目标时。最佳 token 预算取决于所寻找对象的具体大小，较小的目标受益于较大的预算，而较大的目标在较低的 token 数下表现最佳。 AI

影响优化 `max_pixels` 可以提高多模态模型的准确性和效率，尤其是在涉及目标检测或定位的应用中。

排序理由该集群讨论了关于模型配置及其对性能影响的技术发现，并得到了实验数据的支持。[lever_c_demoted from research: ic=1 ai=1.0]

在 dev.to — LLM tag 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

dev.to — LLM tag TIER_1 English(EN) · Niv Dvir · 2026-06-11 07:38

max_pixels is a token budget in disguise — and the right cap depends on the size of what you're looking for

<p>Run the same image through the same Qwen2.5-VL model on different runtimes, and it can cost anywhere from <strong>8 to 16,384 visual tokens</strong> — a 2,000× spread — depending on which inference stack you picked. Nobody changed the model. They just disagree about one config…

报道来源 [1]

max_pixels is a token budget in disguise — and the right cap depends on the size of what you're looking for

相关实体

相关话题