graphics processing unit
PulseAugur coverage of graphics processing unit — every cluster mentioning graphics processing unit across labs, papers, and developer communities, ranked by signal.
- competes with central processing unit 70%
- used by H.1000 Gnome 70%
- competes with application-specific integrated circuit 70%
- used by Dohuk Polytechnic University 70%
- used by Vulkan 70%
- uses data processing unit 70%
- competes with Cerebras Systems 70%
- competes with Tensor Processing Unit 60%
- uses central processing unit 50%
- used by central processing unit 50%
- used by Cerebras Systems 50%
- competes with data processing unit 50%
18 天有情绪数据
-
Detecting GPU Waste in Kubernetes Clusters
This article discusses how to identify and address GPU waste within Kubernetes clusters, a problem that often goes unnoticed due to seemingly healthy utilization metrics. It highlights that inefficient GPU usage can occ…
-
Chinese AI startups secure over $15B in Q1 funding
In the first quarter, the AI sector saw over 110 billion yuan in funding, with domestic large language models experiencing a significant surge. Companies like Moonshot AI and Jueyue Xingchen secured over 30 billion yuan…
-
LLM inference: CPU vs GPU trade-offs detailed for local deployments
This article explores the practical differences between CPU and GPU inference for large language models (LLMs) using the llama.cpp framework. It highlights that while GPUs offer superior speed, CPUs can be a viable alte…
-
Anyscale details Ray Data for scaling multimodal AI data pipelines
Anyscale's blog post details challenges in scaling multimodal AI data pipelines, where preprocessing often starves GPUs, leading to underutilization. The article explains that traditional staged batch execution, which i…
-
AI pricing shifts to flexible models amid rising hardware and operational costs
The existing fixed pricing models for AI services are becoming unsustainable due to rising inference costs and increased usage. Surging prices for GPUs and High Bandwidth Memory (HBM), coupled with higher power and cool…
-
Modal launches autoscaling GPUs for AI research agents
Modal has introduced an autoscaling feature for GPUs designed to support AI research agents. This new capability allows agents to dynamically provision and release compute resources as needed, addressing the challenge o…
-
Modal achieves serverless GPUs for AI inference in seconds
Modal has developed a system to achieve truly serverless GPUs for AI inference, addressing the challenge of rapidly scaling resources to meet variable demand. Their approach involves maintaining cloud buffers of idle GP…
-
Anker launches AI chip, Poland seeks EU AI factory, California passes automation law
Anker is entering the processor market with its new Thus chip, which uses compute-in-memory architecture to deliver 150x more AI processing power for its upcoming Soundcore headphones. Meanwhile, Poland is vying for a B…
-
Shenmou targets wireless cameras with ultra-low-power chips
Shenmou, led by Yang Zuoxing, is developing ultra-low-power chip designs to free cameras from wires, envisioning a future with billions of smart visual terminals. Their first-generation chip achieves one-third the indus…
-
Stanford's ThunderKittens DSL optimizes AI kernel performance
A new article details ThunderKittens, a compact domain-specific language (DSL) developed at Stanford's Hazy Research Lab for creating high-performance AI kernels. The DSL aims to strike a balance between research produc…
-
LLM reliability and cost-efficiency drive new infrastructure solutions
The integration of Large Language Models (LLMs) into professional workflows is shifting from experimental use to essential tooling, emphasizing collaboration rather than automation. However, the reliability of these LLM…
-
New FastKernels benchmark targets GPU kernel generation for LLMs
Researchers have introduced FastKernels, a new benchmark designed to better evaluate GPU kernel generation agents used in production LLM inference. Existing benchmarks are misaligned with real-world systems, leading age…
-
WarmServe system prewarms GPUs for faster multi-LLM serving
Researchers have developed WarmServe, a new system designed to improve the efficiency of serving multiple large language models (LLMs) on shared GPU clusters. WarmServe utilizes a one-for-many GPU prewarming strategy, p…
-
FlashSinkhorn solver accelerates optimal transport on GPUs
Researchers have developed FlashSinkhorn, a new GPU-accelerated solver for entropic optimal transport (EOT) that significantly reduces memory input/output operations. By rewriting stabilized log-domain Sinkhorn updates …
-
New framework speeds up discrete optimization on GPUs
Researchers have developed a new CPU-GPU framework to accelerate optimization problems with discrete variables, which have historically been challenging for GPUs. This framework processes branch and bound nodes in batch…
-
AI industry pivots to token economics, focusing on inference computing centers
The AI industry is shifting its focus from model parameters to computational efficiency, with "token economics" emerging as a new value unit. This transition is driving demand for "token factories" – intelligent computi…
-
New MTP technique speeds AI token generation but needs more VRAM
A new method called MTP (Multi-Token Prediction) has been developed to accelerate token generation in AI models. This technique involves predicting multiple future tokens simultaneously and then having the main model ve…
-
AI chip investor prioritizes product definition over tech
Li Yang, a partner at SenseTime Guoxiang Capital, discusses the AI chip investment landscape, emphasizing that product definition and future use cases are more critical than technology alone. He highlights the shift fro…
-
GPUs Emerge as New Data Storage Paradigm, Mirroring Early Database Challenges
The article posits that GPUs are becoming the new databases, drawing parallels to the early days of database management. Just as teams fumbled through early database adoption, they are now navigating the complexities of…
-
vLLM production guide details key config decisions for performance
This article provides a guide for optimizing vLLM deployments, focusing on three critical configuration decisions that impact performance and cost. It details how static KV cache allocation can lead to GPU out-of-memory…