Researchers adapt LLM for Brazilian healthcare with synthetic data and RL

By PulseAugur Editorial · [1 sources] · 2026-05-05 04:00

Researchers have developed a method to adapt large language models for Brazilian healthcare by injecting knowledge from official clinical guidelines. They created a synthetic dataset of over 70 million tokens from 178 guidelines and fine-tuned a 14-billion parameter model, Qwen2.5-14B-Instruct. This adapted model achieved high scores on new benchmarks, HealthBench-BR and PCDT-QA, outperforming several leading commercial models despite its smaller size. The team has released the datasets, benchmarks, and model weights to foster further research in clinical NLP for Brazilian Portuguese. AI

IMPACT This work could improve the accuracy and relevance of LLMs for specific, non-English clinical domains, potentially aiding healthcare professionals in Brazil.

RANK_REASON This is a research paper detailing the creation of a new dataset and benchmark for clinical NLP in Brazilian Portuguese, along with a fine-tuned model. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Researchers adapt LLM for Brazilian healthcare with synthetic data and RL

COVERAGE [1]

arXiv cs.CL TIER_1 English(EN) · Hugo Abonizio, Filipe Rocha Lopes, Roberto Lotufo, Rodrigo Nogueira · 2026-05-05 04:00

Teaching LLMs Brazilian Healthcare: Injecting Knowledge from Official Clinical Guidelines

arXiv:2605.01077v1 Announce Type: new Abstract: Brazil's Unified Health System (SUS) relies on official clinical guidelines that define diagnostic criteria, treatments, dosages, and monitoring procedures for over 200 million citizens. Yet current LLMs perform poorly on this guide…

COVERAGE [1]

Teaching LLMs Brazilian Healthcare: Injecting Knowledge from Official Clinical Guidelines

RELATED ENTITIES

RELATED TOPICS