Brief · PulseAugur

TOOL · arXiv cs.CL English(EN) · 7h

Improving Heart-Focused Medical Question Answering in LLMs via Variance-Aware Rubric Rewards with GRPO

Researchers have developed a new method to improve the accuracy of Large Language Models (LLMs) in answering heart-related medical questions. Their approach utilizes Group Relative Policy Optimization (GRPO) with a novel Variance-Aware Reward Framework. This framework provides richer optimization signals for sparse, multi-criteria feedback, leading to more stable reinforcement learning. The method significantly boosted accuracy and F1 scores on a heart-focused medical question-answering benchmark, outperforming the base model and remaining competitive with a much larger model. AI

IMPACT Enhances LLM capabilities in specialized medical domains, potentially improving diagnostic support and patient information access.

GPT-OSS-120B
Qwen3-14B
Large Language Models
Group Relative Policy Optimization
HealthBench
Variance-Aware Reward Framework