PulseAugur
实时 10:45:13

LLMs distilled for code generation; benchmarks assess execution potential

Researchers are exploring methods to distill the code generation capabilities of large language models (LLMs) into smaller, more accessible models. One study focuses on generating "Game Code World Models" (GameCWMs) for AI agents, using a curated dataset and a novel training pipeline to improve smaller models like Qwen2.5-3B-Instruct. Another paper reviews the trends, challenges, and future directions of LLM-based code generation tasks, highlighting issues with real-world generalization, robustness, and evaluation validity. A third research effort introduces SURGE, a benchmark designed to assess LLMs' potential as general-purpose surrogate code executors across various programming tasks and complexities. AI

影响 New benchmarks and distillation methods could make advanced code generation more accessible and reliable for AI development.

排序理由 The cluster consists of three arXiv papers discussing LLM capabilities in code generation, world model creation, and execution prediction, including new benchmarks and distillation techniques.

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

报道来源 [3]

  1. arXiv cs.AI TIER_1 Deutsch(DE) · Tyrone Serapio, Arjun Prakash, Haoyang Xu, Kevin Wang, Amy Greenwald ·

    Distilling Game Code World Model Generation into Lightweight Large Language Models

    arXiv:2605.24375v1 Announce Type: new Abstract: Large Language Models (LLMs) have shown great ability in generating executable code from natural language, opening the possibility of automatically constructing environments for AI agents. Recent work on Code World Models (CWMs) dem…

  2. arXiv cs.AI TIER_1 English(EN) · Muslim Chochlov, Michael English, Jim Buckley ·

    A Tertiary Review of Large Language Model-Based Code Generating Tasks: Trends, Challenges, and Future Directions

    arXiv:2605.25536v1 Announce Type: cross Abstract: Context. Large language models (LLMs) are increasingly applied to code-generating tasks (CGTs) in software engineering. While reported results are promising, the broader effects of such application and their integration into real-…

  3. arXiv cs.CL TIER_1 English(EN) · Bohan Lyu, Siqiao Huang, Zichen Liang ·

    SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors

    arXiv:2502.11167v5 Announce Type: replace-cross Abstract: Neural surrogate models are powerful and efficient tools in data mining. Meanwhile, large language models (LLMs) have demonstrated remarkable capabilities in code-related tasks, such as generation and understanding. Howeve…