graphics processing unit
PulseAugur coverage of graphics processing unit — every cluster mentioning graphics processing unit across labs, papers, and developer communities, ranked by signal.
- used by Vulkan 90%
- used by Triton 90%
- used by central processing unit 70%
- competes with Tensor Processing Unit 70%
- competes with application-specific integrated circuit 70%
- instance of high-performance computing 70%
- uses data processing unit 70%
- used by H.1000 Gnome 70%
- used by Innu-aimun 70%
- used by SemiAnalysis 70%
- competes with Cerebras Systems 70%
- used by AI inference 70%
30 day(s) with sentiment data
-
Broadcom, FuriosaAI partner on Ethernet AI inference platform
Broadcom and FuriosaAI have partnered to develop a rack-scale inference platform that aims to move AI infrastructure away from GPU-centric designs. This collaboration integrates FuriosaAI's processor architecture with B…
-
Loongson Technology to raise 2.3B yuan for advanced chip R&D
Loongson Technology plans to raise up to 2.3 billion yuan to fund the R&D and industrialization of chips using Xnm process technology. The funds will also support the development of key CPU and GPU technologies. This in…
-
Sifangda begins small-batch supply of diamond heat sinks after client tests
Sifangda has successfully passed testing for its diamond heat dissipation sheets with an overseas client and has begun small-batch supply. These CVD diamond sheets boast a thermal conductivity exceeding 2000W/(m·K), mak…
-
AI infrastructure advances target GPU savings and agent system standards
This ML digest covers advancements in AI infrastructure, focusing on reducing GPU costs by 2.5 times and optimizing AI for backend operations. It explores new standards for agent systems and addresses challenges in depl…
-
New Qrita Algorithm Boosts LLM Sampling Efficiency
Researchers have developed Qrita, a novel algorithm designed to enhance the efficiency of Top-k and Top-p sampling in large language models. By employing Gaussian-based sigma-truncation and a quaternary pivot search, Qr…
-
AI infrastructure buyers reserve $4B in cooling capacity years ahead
Modine has secured a significant deal worth over $4 billion with an unnamed AI infrastructure customer, which includes a $165 million upfront payment to finance manufacturing expansion. This agreement, extending through…
-
Kubernetes enhances GPU management with Dynamic Resource Allocation
Kubernetes has evolved its GPU management capabilities beyond simply counting devices. The new Dynamic Resource Allocation (DRA) feature allows for more granular control, enabling specific resource profiles, memory allo…
-
Sakura Internet boosts AI-driven capex amid Japan's growing demand
Sakura Internet, a Japanese data center and cloud service provider, is significantly increasing its capital expenditure for the 2026 fiscal year. This boost, potentially reaching up to 30 billion yen, is driven by the s…
-
Detecting GPU Waste in Kubernetes Clusters
This article discusses how to identify and address GPU waste within Kubernetes clusters, a problem that often goes unnoticed due to seemingly healthy utilization metrics. It highlights that inefficient GPU usage can occ…
-
Chinese AI startups secure over $15B in Q1 funding
In the first quarter, the AI sector saw over 110 billion yuan in funding, with domestic large language models experiencing a significant surge. Companies like Moonshot AI and Jueyue Xingchen secured over 30 billion yuan…
-
LLM inference: CPU vs GPU trade-offs detailed for local deployments
This article explores the practical differences between CPU and GPU inference for large language models (LLMs) using the llama.cpp framework. It highlights that while GPUs offer superior speed, CPUs can be a viable alte…
-
Anyscale details Ray Data for scaling multimodal AI data pipelines
Anyscale's blog post details challenges in scaling multimodal AI data pipelines, where preprocessing often starves GPUs, leading to underutilization. The article explains that traditional staged batch execution, which i…
-
AI pricing shifts to flexible models amid rising hardware and operational costs
The existing fixed pricing models for AI services are becoming unsustainable due to rising inference costs and increased usage. Surging prices for GPUs and High Bandwidth Memory (HBM), coupled with higher power and cool…
-
Modal launches autoscaling GPUs for AI research agents
Modal has introduced an autoscaling feature for GPUs designed to support AI research agents. This new capability allows agents to dynamically provision and release compute resources as needed, addressing the challenge o…
-
Modal achieves serverless GPUs for AI inference in seconds
Modal has developed a system to achieve truly serverless GPUs for AI inference, addressing the challenge of rapidly scaling resources to meet variable demand. Their approach involves maintaining cloud buffers of idle GP…
-
Anker launches AI chip, Poland seeks EU AI factory, California passes automation law
Anker is entering the processor market with its new Thus chip, which uses compute-in-memory architecture to deliver 150x more AI processing power for its upcoming Soundcore headphones. Meanwhile, Poland is vying for a B…
-
Shenmou targets wireless cameras with ultra-low-power chips
Shenmou, led by Yang Zuoxing, is developing ultra-low-power chip designs to free cameras from wires, envisioning a future with billions of smart visual terminals. Their first-generation chip achieves one-third the indus…
-
Stanford's ThunderKittens DSL optimizes AI kernel performance
A new article details ThunderKittens, a compact domain-specific language (DSL) developed at Stanford's Hazy Research Lab for creating high-performance AI kernels. The DSL aims to strike a balance between research produc…
-
LLM reliability and cost-efficiency drive new infrastructure solutions
The integration of Large Language Models (LLMs) into professional workflows is shifting from experimental use to essential tooling, emphasizing collaboration rather than automation. However, the reliability of these LLM…
-
WarmServe system prewarms GPUs for faster multi-LLM serving
Researchers have developed WarmServe, a new system designed to improve the efficiency of serving multiple large language models (LLMs) on shared GPU clusters. WarmServe utilizes a one-for-many GPU prewarming strategy, p…