OpenAI has introduced HealthBench, a new benchmark designed to evaluate the performance of AI systems in health-related scenarios. This benchmark was developed in collaboration with 262 physicians from 60 countries and features 5,000 realistic health conversations. Each conversation includes a custom rubric created by physicians to assess AI responses, aiming to ensure AI models are both useful and safe for improving human health. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
RANK_REASON OpenAI released a new benchmark for AI health applications, developed with medical professionals.