Improving Heart-Focused Medical Question Answering in LLMs via Variance-Aware Rubric Rewards with GRPO
Researchers have developed a new method to improve the accuracy of Large Language Models (LLMs) in answering heart-related medical questions. Their approach utilizes Group Relative Policy Optimization (GRPO) with a novel Variance-Aware Reward Framework. This framework provides richer optimization signals for sparse, multi-criteria feedback, leading to more stable reinforcement learning. The method significantly boosted accuracy and F1 scores on a heart-focused medical question-answering benchmark, outperforming the base model and remaining competitive with a much larger model. AI
IMPACT Enhances LLM capabilities in specialized medical domains, potentially improving diagnostic support and patient information access.