English(EN) [NEW MODEL] SupraLabs just released SupraVL-Nano-900k, a Vision-Language Model built entirely from scratch!

SupraLabs 发布透明的、从头构建的视觉语言模型

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-19 02:53

SupraLabs 发布了 SupraVL-Nano-900k，这是一个完全从头开始构建的视觉语言模型。该模型拥有约 90 万个参数，在 Flickr8k 数据集上进行了训练，旨在成为理解图像到文本模型的透明且具有教育意义的蓝图。其架构包括一个 CNN 视觉编码器和一个 GPT-2 风格的 Transformer 解码器，所有组件均已记录并可访问。 AI

影响为理解视觉语言模型架构和训练提供了透明、易于访问的蓝图。

排序理由发布了一个新的、小规模的模型，重点是透明度和教育价值，而不是前沿性能。[lever_c_demoted from research: ic=1 ai=1.0]

在 r/LocalLLaMA 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/LocalLLaMA TIER_1 English(EN) · /u/Dangerous_Try3619 · 2026-06-19 02:53

[NEW MODEL] SupraLabs just released SupraVL-Nano-900k, a Vision-Language Model built entirely from scratch!

<div class="md"><p>Hey <a href="/r/LocalLLaMA">r/LocalLLaMA</a>! We just released <strong>SupraVL-Nano-900k</strong>, our first VLM. It has ~900k parameters, was trained from scratch on Flickr8k, and the entire architecture fits in a single Jupyter notebook. This i…

报道来源 [1]

[NEW MODEL] SupraLabs just released SupraVL-Nano-900k, a Vision-Language Model built entirely from scratch!

相关实体

相关话题