RTLC prompting boosts LLM judge accuracy by 14 percentage points

By PulseAugur Editorial · [2 sources] · 2026-05-13 15:48

Researchers have developed a new three-stage prompting technique called RTLC (Research, Teach-to-Learn, Critique) that significantly improves the accuracy of large language models when used as judges. This method, inspired by the Feynman Learning Technique, enhances a single LLM's performance without requiring fine-tuning or external tools. When applied to Claude 3.7 Sonnet on the JudgeBench-GPT dataset, RTLC boosted pairwise accuracy from 64.6% to 78.6%, outperforming other ensemble methods. AI

IMPACT This new prompting technique could standardize LLM evaluation, leading to more reliable benchmarks and faster model development.

RANK_REASON The cluster describes a new research paper detailing a novel prompting technique for LLMs.

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

RTLC prompting boosts LLM judge accuracy by 14 percentage points

COVERAGE [2]

arXiv cs.AI TIER_1 English(EN) · Andrea Morandi · 2026-05-13 15:48

RTLC -- Research, Teach-to-Learn, Critique: A three-stage prompting paradigm inspired by the Feynman Learning Technique that lifts LLM-as-judge accuracy on JudgeBench with no fine-tuning

LLM-as-a-judge is now the default measurement instrument for open-ended generation, but on the public JudgeBench benchmark even strong instruction-tuned judges barely scrape past random on objective-correctness pairwise items. We introduce RTLC, a three-stage prompting recipe -- …
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-13 15:48

RTLC -- Research, Teach-to-Learn, Critique: A three-stage prompting paradigm inspired by the Feynman Learning Technique that lifts LLM-as-judge accuracy on JudgeBench with no fine-tuning

LLM-as-a-judge is now the default measurement instrument for open-ended generation, but on the public JudgeBench benchmark even strong instruction-tuned judges barely scrape past random on objective-correctness pairwise items. We introduce RTLC, a three-stage prompting recipe -- …

COVERAGE [2]

RTLC -- Research, Teach-to-Learn, Critique: A three-stage prompting paradigm inspired by the Feynman Learning Technique that lifts LLM-as-judge accuracy on JudgeBench with no fine-tuning

RTLC -- Research, Teach-to-Learn, Critique: A three-stage prompting paradigm inspired by the Feynman Learning Technique that lifts LLM-as-judge accuracy on JudgeBench with no fine-tuning

RELATED ENTITIES

RELATED TOPICS