PulseAugur
实时 11:44:39
English(EN) 📰 Open-Source AI Agent Scores 65.2% on TerminalBench 2.0 in 2026, Beating Gemini and Junie CLI An open-source AI agent has achieved a record 65.2% success rate

开源AI代理在TerminalBench 2.0上超越Gemini和GPT-4

一个在土耳其开发的名为OSS Agent I的开源AI代理在TerminalBench 2.0基准测试中取得了65.2%的成功率。这一表现超越了Google的Gemini-3-flash-preview、GPT-4和Anthropic的Claude 3等成熟模型。开发者已确认未采用任何欺骗性手段,凸显了该代理在处理复杂终端任务方面的真实能力。 AI

影响 展示了开源AI代理在自主完成复杂现实世界任务方面的显著进步。

排序理由 开源模型发布取得了显著的基准测试结果。

在 Mastodon — mastodon.social 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

开源AI代理在TerminalBench 2.0上超越Gemini和GPT-4

报道来源 [2]

  1. Mastodon — mastodon.social TIER_1 English(EN) · aihaberleri ·

    📰 Open-Source AI Agent Scores 65.2% on TerminalBench 2.0 in 2026, Beating Gemini and Junie CLI An open-source AI agent has achieved a record 65.2% success rate

    📰 Open-Source AI Agent Scores 65.2% on TerminalBench 2.0 in 2026, Beating Gemini and Junie CLI An open-source AI agent has achieved a record 65.2% success rate on TerminalBench 2.0, surpassing Google's Gemini-3-flash-preview and Junie CLI. The developer confirms no cheating mecha…

  2. Mastodon — mastodon.social TIER_1 Türkçe(TR) · aihaberleri ·

    📰 OSS Agent I Ranked First in TerminalBench in 2026: Turkish AI Surpassed GPT-4 and Claude 3. OSS Agent I, developed in Turkey,...

    📰 OSS Agent I 2026'da TerminalBench'de Birinci Oldu: Türkiye Yapay Zekâsı GPT-4 ve Claude 3'ü Geçti Türkiye'de geliştirilen OSS Agent I, TerminalBench adlı dünyanın en zorlu terminal ortamı testinde ilk sıraya yükseldi. Bu başarı, yapay zekânın gerçek dünya görevlerini bağımsız t…