An open-source AI agent, developed in Turkey and named OSS Agent I, has achieved a 65.2% success rate on the TerminalBench 2.0 benchmark. This performance surpasses that of established models like Google's Gemini-3-flash-preview, GPT-4, and Anthropic's Claude 3. The developers have confirmed that no deceptive practices were employed, underscoring the agent's genuine capabilities in handling complex terminal tasks. AI
IMPACT Demonstrates significant progress in open-source AI agents' ability to autonomously complete complex real-world tasks.
RANK_REASON Open-source model release achieving a notable benchmark result.
Read on Mastodon — mastodon.social →
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →