Beyond English benchmarks: clinical llm evaluation in Brazilian Portuguese
Researchers have developed ClinicalBr, a new bilingual benchmark for evaluating clinical Large Language Models in Brazilian Portuguese and English. The benchmark, derived from real Brazilian medical case reports, covers tasks like diagnosis retrieval, differential diagnosis, and treatment planning. Initial findings indicate that while English models show an advantage in diagnosis retrieval, this gap diminishes for other tasks, with Portuguese performance sometimes even exceeding English. AI
IMPACT Establishes a new evaluation standard for clinical LLMs in non-English languages, potentially improving global accessibility and performance.