English(EN) AgentCompile: An LLM-Guided Compiler for Direct CUDA Inference

大语言模型驱动的编译器加速了Transformer的CUDA推理

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-09 04:00

研究人员开发了AgentCompile，这是一种利用大语言模型（LLMs）优化CUDA上Transformer推理的新型编译器。 AgentCompile使用大语言模型的输出来作为指导性元数据，以指导专门化和CUDA实现选择的决策。这种方法已显示出显著的加速效果，对于Qwen3-1.7B、Qwen3-4B和Llama-3.2-1B-Instruct模型，其推理速度分别比PyTorch eager快了平均5.66倍、4.05倍和4.26倍。 AI

影响这种编译器技术可以显著提高在专用硬件上运行大语言模型的效率和速度。

排序理由该集群包含一篇详细介绍LLM推理新编译器技术的论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Xuanzhe Li, Ziyan Weng, Zhiyu Zhu, Junhui Hou · 2026-06-09 04:00

AgentCompile：一个由LLM驱动的直接CUDA推理编译器

arXiv:2606.07665v1 Announce Type: cross Abstract: Transformer inference increasingly depends on specialized compiler and runtime support, but real model graphs still require semantic decisions about which regions are worth specializing and which CUDA implementation families are p…

报道来源 [1]

AgentCompile：一个由LLM驱动的直接CUDA推理编译器

相关实体

相关话题