PulseAugur
EN
LIVE 10:46:03

New Multi-LCB benchmark tests LLMs across 12 programming languages

Researchers have introduced Multi-LCB, a new benchmark designed to evaluate large language models (LLMs) on code generation across twelve programming languages, extending the capabilities of the existing Python-only LiveCodeBench (LCB). This new benchmark transforms LCB's Python tasks into equivalent tasks in other languages while maintaining contamination controls and evaluation protocols. Initial evaluations of 24 LLMs using Multi-LCB revealed significant Python overfitting, language-specific contamination issues, and notable performance disparities across different languages, highlighting critical gaps in current LLM multilingual coding abilities. AI

IMPACT Highlights critical gaps in LLM multilingual coding capabilities and the need for models to generalize beyond Python.

RANK_REASON The cluster describes a new benchmark paper published on arXiv.

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

New Multi-LCB benchmark tests LLMs across 12 programming languages

COVERAGE [3]

  1. arXiv cs.AI TIER_1 English(EN) · Maria Ivanova, Pavel Zadorozhny, Rodion Levichev, Ivan Petrov, Adamenko Pavel, Ivan Lopatin, Alexey Kutalev, Dmitrii Babaev ·

    Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages

    arXiv:2606.20517v1 Announce Type: new Abstract: LiveCodeBench (LCB) has recently become a widely adopted benchmark for evaluating large language models (LLMs) on code-generation tasks. By curating competitive programming problems, constantly adding fresh problems to the set, and …

  2. arXiv cs.AI TIER_1 English(EN) · Dmitrii Babaev ·

    Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages

    LiveCodeBench (LCB) has recently become a widely adopted benchmark for evaluating large language models (LLMs) on code-generation tasks. By curating competitive programming problems, constantly adding fresh problems to the set, and filtering them by release dates, LCB provides co…

  3. Hugging Face Daily Papers TIER_1 English(EN) ·

    Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages

    Multi-LCB addresses the limitation of LiveCodeBench by providing a multi-language benchmark for evaluating LLMs across twelve programming languages while maintaining contamination controls and evaluation protocols.