Researchers at METR have conducted experiments to measure the impact of post-training enhancements on AI agent capabilities, using versions of OpenAI's GPT-3.5 Turbo and GPT-4. Their findings indicate that OpenAI's own post-training efforts significantly boosted agent performance by 26 percentage points, a gain comparable to the jump from GPT-3.5 to GPT-4. While their own attempts to further improve agent performance through tweaked prompting and tools yielded smaller, statistically insignificant gains, the study suggests that dramatically increasing a model's dangerous capabilities after it has been competently fine-tuned may be challenging, though further research is needed. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
RANK_REASON The cluster is based on a research paper evaluating AI agent capabilities and the impact of post-training enhancements.