PulseAugur
EN
LIVE 18:09:43

Open-source AI agent surpasses Gemini and GPT-4 on TerminalBench 2.0

An open-source AI agent, developed in Turkey and named OSS Agent I, has achieved a 65.2% success rate on the TerminalBench 2.0 benchmark. This performance surpasses that of established models like Google's Gemini-3-flash-preview, GPT-4, and Anthropic's Claude 3. The developers have confirmed that no deceptive practices were employed, underscoring the agent's genuine capabilities in handling complex terminal tasks. AI

IMPACT Demonstrates significant progress in open-source AI agents' ability to autonomously complete complex real-world tasks.

RANK_REASON Open-source model release achieving a notable benchmark result.

Read on Mastodon — mastodon.social →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

Open-source AI agent surpasses Gemini and GPT-4 on TerminalBench 2.0

COVERAGE [2]

  1. Mastodon — mastodon.social TIER_1 English(EN) · aihaberleri ·

    📰 Open-Source AI Agent Scores 65.2% on TerminalBench 2.0 in 2026, Beating Gemini and Junie CLI An open-source AI agent has achieved a record 65.2% success rate

    📰 Open-Source AI Agent Scores 65.2% on TerminalBench 2.0 in 2026, Beating Gemini and Junie CLI An open-source AI agent has achieved a record 65.2% success rate on TerminalBench 2.0, surpassing Google's Gemini-3-flash-preview and Junie CLI. The developer confirms no cheating mecha…

  2. Mastodon — mastodon.social TIER_1 Türkçe(TR) · aihaberleri ·

    📰 OSS Agent I Ranked First in TerminalBench in 2026: Turkish AI Surpassed GPT-4 and Claude 3. OSS Agent I, developed in Turkey,...

    📰 OSS Agent I 2026'da TerminalBench'de Birinci Oldu: Türkiye Yapay Zekâsı GPT-4 ve Claude 3'ü Geçti Türkiye'de geliştirilen OSS Agent I, TerminalBench adlı dünyanın en zorlu terminal ortamı testinde ilk sıraya yükseldi. Bu başarı, yapay zekânın gerçek dünya görevlerini bağımsız t…