AI Giants Fail Real-World Application Test, Scoring Below 25%

By PulseAugur Editorial · [2 sources] · 2026-06-15 12:25

A new benchmark developed by researchers at the University of California, Berkeley, has revealed that leading AI models struggle with real-world applications, scoring below 25%. OpenAI's GPT-5.5 achieved the highest score with a 24% pass rate, followed closely by Anthropic's Claude Fable 5 at 22%. Other prominent models like Google Gemini, DeepSeek, and Grok scored below 16% on tasks ranging from audio processing to theoretical physics. AI

IMPACT Highlights significant limitations in current AI capabilities for real-world tasks, suggesting a gap between theoretical performance and practical application.

RANK_REASON The cluster reports on a new benchmark and its results, which is a research output from a university.

Read on Mastodon — sigmoid.social →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

AI Giants Fail Real-World Application Test, Scoring Below 25%

COVERAGE [2]

Mastodon — sigmoid.social TIER_1 English(EN) · [email protected] · 2026-06-15 12:25

AI giants score below 25% in UC Berkeley-led test of real-world application | Campus https://www. byteseu.com/2109375/ # Agents ’LastExam # AI # ale # Anthropic

AI giants score below 25% in UC Berkeley-led test of real-world application | Campus https://www. byteseu.com/2109375/ # Agents ’LastExam # AI # ale # Anthropic # ArtificialIntelligence # BenjaminLiu # ChristineParlour # ClaudeFable5 # DawnSong # DecentralizedIntelligence # DeepS…

LINKS byteseu.com/2109375
r/OpenAI TIER_2 English(EN) · /u/the_daily_cal · 2026-06-15 23:10

AI giants score below 25% in UC Berkeley-led test of real-world application

<table> <tr><td> <a href="https://www.reddit.com/r/OpenAI/comments/1u6wkhf/ai_giants_score_below_25_in_uc_berkeleyled_test/"> <img alt="AI giants score below 25% in UC Berkeley-led test of real-world application" src="https://external-preview.redd.it/M8_lt4wTOp7_sTIOIP05RKaIgyegJ…

COVERAGE [2]

AI giants score below 25% in UC Berkeley-led test of real-world application | Campus https://www. byteseu.com/2109375/ # Agents ’LastExam # AI # ale # Anthropic

AI giants score below 25% in UC Berkeley-led test of real-world application

RELATED ENTITIES

RELATED TOPICS