A new benchmark reveals that frontier large language models degrade approximately 25% of documents during extended workflows. Separately, a Fields Medal winner has reported that ChatGPT 5.5 Pro is capable of solving complex PhD-level mathematics problems. AI
IMPACT New benchmarks highlight potential data corruption issues with frontier LLMs, while advanced models demonstrate capabilities in complex academic domains.
RANK_REASON The cluster contains a new benchmark result and a report on model capabilities, fitting the research category. [lever_c_demoted from research: ic=1 ai=1.0]
Read on Mastodon — sigmoid.social →
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →