None Do VLMs in production still use fixed-patch ViTs for their vision capabilities? [D]

生产中的VLMs：固定的ViT补丁是否仍占主导地位？

作者 PulseAugur 编辑部 · [1 source] · 2026-05-21 14:46

Reddit的r/MachineLearning子版块上的一场讨论，探讨了当前生产级别的视觉语言模型（VLMs）是否使用固定的视觉Transformer（ViTs）补丁来实现其视觉处理。发帖人质疑主要的VLM开发者是否采用了更高效、输入自适应的标记化方法，并推测了继续使用固定补丁的潜在原因，例如边际收益、流水线效率或动态补丁的扩展法则尚未成熟。 AI

影响这次讨论突出了当前VLM实现的一个技术细节，可能影响其未来发展或对其能力的理解。

排序理由这是Reddit上关于VLM技术方面的一个讨论帖，而非主要来源的公告或研究论文。

在 r/MachineLearning 阅读 →

其他

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/MachineLearning TIER_1 · /u/howtorewriteaname · 2026-05-21 14:46

Do VLMs in production still use fixed-patch ViTs for their vision capabilities? [D]

<div class="md"><p>The research community has provided (already for some time) seemingly more efficient and effective tokenizations for vision. Do we have any hint on whether non-fixed-patches tokenization is being applied on the big player models?</p> <p>I imagine…

报道来源 [1]

Do VLMs in production still use fixed-patch ViTs for their vision capabilities? [D]

相关实体

相关话题