Towards the Readability of LLM-Generated Codes through Multitask Representation Engineering
Researchers are developing new benchmarks and techniques to evaluate and improve Large Language Models (LLMs) in code generation and translation. One study introduces a multilingual, execution-grounded evaluation for open code LLMs, revealing current models lag significantly behind human performance and highlighting performance variations across languages and problem types. Another benchmark, CodeTaste, focuses on LLM-generated code refactorings, showing a gap between generating specified refactorings and discovering human-chosen ones. Additionally, efforts are underway to improve code readability through multitask representation engineering and to create better datasets for code translation, especially for low-resource programming domains. Tools like src2md are also emerging to help fit large codebases into LLM context windows for better analysis. AI
IMPACT New evaluation methodologies and tools are emerging to better assess and enhance LLM capabilities in code generation, refactoring, and translation, addressing critical limitations in current models.