Standard Intelligence is developing a novel approach to training general AI agents by focusing on raw video data of computer usage, rather than language-based methods. Their thesis is that scaling action data through video is the most promising path to creating capable agents. The company has built a massive dataset of computer actions and an efficient video encoder, enabling their foundation model, FDM-1, to perform complex tasks like CAD design and autonomous driving after fine-tuning. AI
影响 This video-centric pre-training approach could unlock new agent capabilities by scaling action data more effectively than language models.
排序理由 The cluster describes a new pre-training paradigm and a foundation model from a startup, detailed in a blog post that functions as a research paper.
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →