PulseAugur
实时 13:22:59

Stateful Transformers boost streaming inference; Intel releases AutoRound quantization toolkit

A new paper introduces a stateful transformer inference engine that significantly speeds up processing for streaming data by maintaining a persistent KV cache. This approach allows for query latency that is independent of accumulated context size, achieving up to a 5.9x speedup on market-data benchmarks compared to existing engines. Separately, Intel has released AutoRound, an advanced quantization toolkit for LLMs and VLMs that enables high accuracy at ultra-low bit widths (2-4 bits) with broad hardware compatibility, integrating with popular frameworks like vLLM and Transformers. AI

影响 New inference techniques and quantization methods reduce computational costs, potentially enabling wider deployment of large models.

排序理由 The cluster contains an academic paper detailing a new inference technique and a software toolkit for model quantization.

在 Mastodon — mastodon.social 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。 我们如何撰写摘要 →

Stateful Transformers boost streaming inference; Intel releases AutoRound quantization toolkit

报道来源 [4]

  1. arXiv cs.LG TIER_1 English(EN) · Victor Norgren ·

    Attention Once Is All You Need: Efficient Streaming Inference with Stateful Transformers

    Conventional transformer inference engines are request-driven, paying an O(n) prefill cost on every query. In streaming workloads, where data arrives continuously and queries probe an ever-growing context, this cost is prohibitive. We introduce a data-driven computational model c…

  2. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    Advanced Quantization Algorithm for LLMs https:// github.com/intel/auto-round # HackerNews # AdvancedQuantization # LLMs # MachineLearning # AI # Research # Int

    Advanced Quantization Algorithm for LLMs https:// github.com/intel/auto-round # HackerNews # AdvancedQuantization # LLMs # MachineLearning # AI # Research # Intel

  3. Mastodon — mastodon.social TIER_1 English(EN) · [email protected] ·

    Advanced Quantization Algorithm for LLMs https://github.com/intel/auto-round # HackerNews # Tech # AI

    Advanced Quantization Algorithm for LLMs https://github.com/intel/auto-round # HackerNews # Tech # AI

  4. Mastodon — mastodon.social TIER_1 English(EN) · rmathew ·

    An excellent introduction to # quantization used for # LLMs 👌🏽: “Quantization From The Ground Up”, Sam Rose, Ngrok ( https:// ngrok.com/blog/quantization ). On

    An excellent introduction to # quantization used for # LLMs 👌🏽: “Quantization From The Ground Up”, Sam Rose, Ngrok ( https:// ngrok.com/blog/quantization ). On HN: https:// news.ycombinator.com/item?id=4 7519295 # AI # Math # FloatingPoint # NumericalAnalysis # Numbers # NeuralNe…