AI inference
PulseAugur coverage of AI inference — every cluster mentioning AI inference across labs, papers, and developer communities, ranked by signal.
3 天有情绪数据
-
Modal achieves serverless GPUs for AI inference in seconds
Modal has developed a system to achieve truly serverless GPUs for AI inference, addressing the challenge of rapidly scaling resources to meet variable demand. Their approach involves maintaining cloud buffers of idle GP…
-
New FastKernels benchmark targets GPU kernel generation for LLMs
Researchers have introduced FastKernels, a new benchmark designed to better evaluate GPU kernel generation agents used in production LLM inference. Existing benchmarks are misaligned with real-world systems, leading age…
-
Modified RTX 2080 Ti GPUs run Qwen 3.6 AI model at 38 tokens/sec
An enthusiast has modified NVIDIA GeForce RTX 2080 Ti graphics cards to run the Qwen 3.6 27B AI model at 38 tokens per second. This setup utilizes older hardware, demonstrating that advanced AI inference is achievable w…
-
Startup SPAN pitches home-based mini data centers for AI compute
A startup called SPAN is piloting a plan to deploy thousands of mini data centers in residential homes to increase AI compute capacity. These distributed nodes, equipped with liquid-cooled Nvidia GPUs, aim to provide co…
-
AI inference split into human-facing and agentic workloads
Ben Thompson proposes a new framework for understanding AI inference workloads, dividing them into "answer inference" and "agentic inference." Answer inference, which requires immediate human feedback, will continue to …
-
Aria Networks CEO: AI inference reshapes data center networking
Aria Networks, an AI networking startup, argues that the network is becoming a critical component in AI infrastructure, moving beyond its traditional role. The company's CEO, Mansour Karam, emphasizes that optimizing fo…