English(EN) An excellent introduction to # quantization used for # LLMs 👌🏽: “Quantization From The Ground Up”, Sam Rose, Ngrok ( https:// ngrok.com/blog/quantization ). On

Stateful Transformers 提升流式推理性能；Intel 发布 AutoRound 量化工具包

作者 PulseAugur 编辑部 · [4 个来源] · 2026-04-29 13:19

一篇新论文介绍了一种有状态的 Transformer 推理引擎，通过维护持久的 KV 缓存，显著加快了流式数据的处理速度。这种方法实现了与累积上下文大小无关的查询延迟，在市场数据基准测试中比现有引擎快了 5.9 倍。此外，Intel 发布了 AutoRound，一个用于 LLM 和 VLM 的先进量化工具包，可在超低比特宽度（2-4 位）下实现高精度和广泛的硬件兼容性，并与 vLLM 和 Transformers 等流行框架集成。 AI

影响新的推理技术和量化方法降低了计算成本，可能有助于大型模型的更广泛部署。

排序理由该集群包含一篇详细介绍新推理技术的学术论文和一个用于模型量化的软件工具包。

在 Mastodon — mastodon.social 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。我们如何撰写摘要 →

报道来源 [4]

arXiv cs.LG TIER_1 English(EN) · Victor Norgren · 2026-05-13 17:06

Attention Once Is All You Need: Efficient Streaming Inference with Stateful Transformers

Conventional transformer inference engines are request-driven, paying an O(n) prefill cost on every query. In streaming workloads, where data arrives continuously and queries probe an ever-growing context, this cost is prohibitive. We introduce a data-driven computational model c…
Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] · 2026-05-01 13:43

面向大型语言模型的先进量化算法 https:// github.com/intel/auto-round # HackerNews # AdvancedQuantization # LLMs # MachineLearning # AI # Research # Int

Advanced Quantization Algorithm for LLMs https:// github.com/intel/auto-round # HackerNews # AdvancedQuantization # LLMs # MachineLearning # AI # Research # Intel

链接 github.com/…/auto-round
Mastodon — mastodon.social TIER_1 English(EN) · [email protected] · 2026-05-01 09:10

面向大型语言模型的先进量化算法 https://github.com/intel/auto-round # HackerNews # Tech # AI

Advanced Quantization Algorithm for LLMs https://github.com/intel/auto-round # HackerNews # Tech # AI

链接 github.com/…/auto-round
Mastodon — mastodon.social TIER_1 English(EN) · rmathew · 2026-04-29 13:19

面向#LLM的#量化入门指南 👌🏽：“Quantization From The Ground Up”，Sam Rose，Ngrok ( https://ngrok.com/blog/quantization )。关于

An excellent introduction to # quantization used for # LLMs 👌🏽: “Quantization From The Ground Up”, Sam Rose, Ngrok ( https:// ngrok.com/blog/quantization ). On HN: https:// news.ycombinator.com/item?id=4 7519295 # AI # Math # FloatingPoint # NumericalAnalysis # Numbers # NeuralNe…

链接 ngrok.com/…/quantization

报道来源 [4]

Attention Once Is All You Need: Efficient Streaming Inference with Stateful Transformers

面向大型语言模型的先进量化算法 https:// github.com/intel/auto-round # HackerNews # AdvancedQuantization # LLMs # MachineLearning # AI # Research # Int

面向大型语言模型的先进量化算法 https://github.com/intel/auto-round # HackerNews # Tech # AI

面向#LLM的#量化入门指南 👌🏽：“Quantization From The Ground Up”，Sam Rose，Ngrok ( https://ngrok.com/blog/quantization )。关于

相关实体

相关话题