PulseAugur / Brief
EN
LIVE 23:59:55

Brief

last 24h
[1/1] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. DiffusionGemma: How Google's New Open LLM Hits 1,000 Tokens/sec and Changes Inference Economics

    Google DeepMind has released DiffusionGemma, an open-weight LLM that utilizes a diffusion architecture for text generation, enabling significantly faster inference speeds compared to traditional autoregressive models. This new model can process up to 1,000 tokens per second on a single H100 GPU and requires only 18 GB of VRAM, making it efficient for single-GPU deployments. While it trades some accuracy for speed, it excels in tasks like code infilling and real-time applications, and also supports multi-modal inputs including images and video. AI

    IMPACT Accelerates inference speeds and reduces VRAM requirements, potentially enabling new real-time applications and wider single-GPU deployments.