PulseAugur
实时 11:28:31

Open-source AI agent surpasses Gemini and GPT-4 on TerminalBench 2.0

An open-source AI agent, developed in Turkey and named OSS Agent I, has achieved a 65.2% success rate on the TerminalBench 2.0 benchmark. This performance surpasses that of established models like Google's Gemini-3-flash-preview, GPT-4, and Anthropic's Claude 3. The developers have confirmed that no deceptive practices were employed, underscoring the agent's genuine capabilities in handling complex terminal tasks. AI

影响 Demonstrates significant progress in open-source AI agents' ability to autonomously complete complex real-world tasks.

排序理由 Open-source model release achieving a notable benchmark result.

在 Mastodon — mastodon.social 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

Open-source AI agent surpasses Gemini and GPT-4 on TerminalBench 2.0

报道来源 [2]

  1. Mastodon — mastodon.social TIER_1 English(EN) · aihaberleri ·

    📰 Open-Source AI Agent Scores 65.2% on TerminalBench 2.0 in 2026, Beating Gemini and Junie CLI An open-source AI agent has achieved a record 65.2% success rate

    📰 Open-Source AI Agent Scores 65.2% on TerminalBench 2.0 in 2026, Beating Gemini and Junie CLI An open-source AI agent has achieved a record 65.2% success rate on TerminalBench 2.0, surpassing Google's Gemini-3-flash-preview and Junie CLI. The developer confirms no cheating mecha…

  2. Mastodon — mastodon.social TIER_1 Türkçe(TR) · aihaberleri ·

    📰 OSS Agent I Ranked First in TerminalBench in 2026: Turkish AI Surpassed GPT-4 and Claude 3. OSS Agent I, developed in Turkey,...

    📰 OSS Agent I 2026'da TerminalBench'de Birinci Oldu: Türkiye Yapay Zekâsı GPT-4 ve Claude 3'ü Geçti Türkiye'de geliştirilen OSS Agent I, TerminalBench adlı dünyanın en zorlu terminal ortamı testinde ilk sıraya yükseldi. Bu başarı, yapay zekânın gerçek dünya görevlerini bağımsız t…