Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages
Researchers have introduced Multi-LCB, a new benchmark designed to evaluate large language models (LLMs) on code generation across twelve programming languages, extending the capabilities of the existing Python-only LiveCodeBench (LCB). This new benchmark transforms LCB's Python tasks into equivalent tasks in other languages while maintaining contamination controls and evaluation protocols. Initial evaluations of 24 LLMs using Multi-LCB revealed significant Python overfitting, language-specific contamination issues, and notable performance disparities across different languages, highlighting critical gaps in current LLM multilingual coding abilities. AI
IMPACT Highlights critical gaps in LLM multilingual coding capabilities and the need for models to generalize beyond Python.