PulseAugur
实时 05:28:15

LLM judges evaluate agentic stock predictors, improving accuracy via reinforcement learning

Researchers have developed a novel framework for evaluating agentic stock prediction systems by utilizing large language models as judges. This system breaks down performance into six specific dimensions, including regime detection and risk calibration, offering a more nuanced assessment than traditional aggregate metrics. The LLM judges, including GPT 5.4, Claude 4.6 Opus, and Gemini 3.1 Pro, demonstrated high agreement and correlated well with realized trading performance. This behavioral evaluation was then integrated into a reinforcement learning feedback loop, leading to significant improvements in prediction accuracy and trading strategy. AI

影响 Introduces a new method for evaluating and improving AI agents in complex decision-making tasks like financial prediction.

排序理由 Academic paper detailing a new evaluation framework for AI systems. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

LLM judges evaluate agentic stock predictors, improving accuracy via reinforcement learning

报道来源 [1]

  1. arXiv cs.LG TIER_1 English(EN) · Mohammad Al Ridhawi, Mahtab Haj Ali, Hussein Al Osman ·

    Multi-Dimensional Behavioral Evaluation of Agentic Stock Prediction Systems Using LLM Judges with Closed-Loop Reinforcement Learning Feedback

    arXiv:2605.05739v1 Announce Type: new Abstract: Agentic stock prediction systems make sequences of interdependent decisions (regime detection, pathway routing, reinforcement learning control) whose individual quality is hidden by aggregate metrics such as mean absolute percentage…