Themis project trains multilingual code reward models for flexible scoring

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 2 sources

Researchers have developed Themis-RM, a suite of multilingual code reward models designed for flexible scoring across multiple criteria. These models, ranging from 600M to 32B parameters, were trained on Themis-CodePreference, a large dataset of over 350,000 code preferences. The accompanying Themis-CodeRewardBench benchmark evaluates code RMs across eight programming languages and five preference dimensions, revealing limitations in current models beyond functional correctness. Experiments show positive scaling trends and strong cross-lingual transfer, highlighting the value of multi-criteria training. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Introduces new tools and benchmarks for evaluating and training code generation models, potentially improving their multi-lingual and multi-criteria capabilities.

RANK_REASON This is a research paper detailing the creation of new code reward models and a benchmark.

Read on arXiv cs.LG →

COVERAGE [2]

arXiv cs.LG TIER_1 · Indraneil Paul, Glava\v{s} Glavas, Iryna Gurevych · 2026-05-04 04:00

Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring

arXiv:2605.00754v1 Announce Type: cross Abstract: Reward models (RMs) have become an indispensable fixture of the language model (LM) post-training playbook, enabling policy alignment and test-time scaling. Research on the application of RMs in code generation, however, has been …
arXiv cs.LG TIER_1 · Iryna Gurevych · 2026-05-01 16:07

Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring

Reward models (RMs) have become an indispensable fixture of the language model (LM) post-training playbook, enabling policy alignment and test-time scaling. Research on the application of RMs in code generation, however, has been comparatively sparse, with existing work largely f…

COVERAGE [2]

Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring

Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring

RELATED ENTITIES

RELATED TOPICS