A new benchmark reveals that frontier large language models degrade approximately 25% of documents during extended workflows. Separately, a Fields Medal winner has reported that ChatGPT 5.5 Pro is capable of solving complex PhD-level mathematics problems. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT New benchmarks highlight potential data corruption issues with frontier LLMs, while advanced models demonstrate capabilities in complex academic domains.
RANK_REASON The cluster contains a new benchmark result and a report on model capabilities, fitting the research category. [lever_c_demoted from research: ic=1 ai=1.0]