graphics processing unit
PulseAugur coverage of graphics processing unit — every cluster mentioning graphics processing unit across labs, papers, and developer communities, ranked by signal.
- used by Vulkan 90%
- used by Triton 90%
- used by central processing unit 70%
- competes with Tensor Processing Unit 70%
- competes with application-specific integrated circuit 70%
- competes with Apple Neural Engine 70%
- instance of high-performance computing 70%
- used by AI inference 70%
- used by H.1000 Gnome 70%
- used by Innu-aimun 70%
- competes with Cerebras Systems 70%
- used by SemiAnalysis 70%
29 day(s) with sentiment data
-
Datavault AI raises $120M to build nationwide GPU network for AI compute
Datavault AI has secured $120 million in funding from Scilex Holding to establish a nationwide GPU network. This initiative aims to provide increased computing power for companies engaged in artificial intelligence deve…
-
FluxMoE system decouples expert weights for faster LLM serving
Researchers have developed FluxMoE, a new system designed to improve the efficiency of serving Mixture-of-Experts (MoE) models. FluxMoE addresses the challenge of large parameter sizes in MoE models by decoupling expert…
-
RoundPipe enables efficient LLM fine-tuning on consumer GPUs
Researchers have developed RoundPipe, a new pipeline scheduling method designed to make fine-tuning large language models on consumer-grade GPUs more efficient. This approach addresses the limitations of existing method…
-
Study finds switchless networks more cost-effective for MoE LLM serving
A new paper analyzes network topologies for Mixture-of-Experts (MoE) Large Language Model (LLM) serving, finding that lower-cost, switchless networks can be more cost-effective than expensive scale-up infrastructures. T…
-
VkSplat pipeline boosts 3D Gaussian Splatting training with Vulkan compute
Researchers have developed VkSplat, a novel training pipeline for 3D Gaussian Splatting (3DGS) that utilizes Vulkan compute for enhanced performance and broader compatibility. This new approach offers a significant spee…
-
Researchers combine DPUs and GPUs for faster neural network inference
Researchers have developed a novel method for accelerating neural network inference by splitting Convolutional Neural Network (CNN) computations between Deep Learning Processing Units (DPUs) and Graphics Processing Unit…
-
AI tools read code, not minds; Chinese GPU maker revenue hits $423M
A Chinese GPU maker, Cambricon, reported its first-quarter revenue at $423 million. Separately, a blog post discusses how AI tools can read code but not minds, and another mentions AI breaking Silicon Valley's global pl…
-
Strait system enhances ML inference serving with priority-aware scheduling
Researchers have developed Strait, a new system designed to improve the efficiency of machine learning inference serving, particularly in on-premises environments. Strait addresses limitations in task prioritization and…
-
China Dominates Critical Minerals for AI Supply Chain
Six critical chokepoints in the AI supply chain, from raw materials to finished chips, are dominated by China. The country processes 90% of rare earths, highlighting its significant control over the production of GPUs, …
-
T-Head unveils Panmai 920 smartNIC, completing its AI infrastructure chip lineup
Pingtan, a subsidiary of Alibaba, has launched its first intelligent network card, the "Panmai 920," designed to address bottlenecks in AI computing infrastructure. This new network card utilizes advanced PCIe 5.0 and 1…
-
Google to sell its TPUs to some customers, who also fancy big-G GPUs
Alphabet announced a significant increase in its 2026 capital expenditure guidance, raising it to $180-$190 billion, driven by unprecedented demand for AI computing resources. The company's CFO highlighted strong growth…
-
Visual explainers detail GPU's AI role and embedding vector meaning
A visual explainer details why Graphics Processing Units (GPUs) are highly effective for artificial intelligence tasks, highlighting their strengths in matrix multiplication, parallel processing, memory bandwidth, and b…
-
FloatSOM framework accelerates distributed Self-Organizing Maps with flexible topologies
Researchers have developed FloatSOM, a new framework designed for large-scale Self-Organizing Map (SOM) analysis that overcomes memory limitations on GPUs. This framework enables multi-GPU execution and supports out-of-…
-
GPU firmware lags behind hardware, throttling AI workloads
The article argues that current GPU firmware is outdated, relying on early 2000s logic to manage modern AI workloads. This outdated firmware is identified as a bottleneck, potentially throttling the performance of advan…
-
GPU shortage becomes AI's biggest bottleneck, spurring efficiency focus
The escalating demand for Graphics Processing Units (GPUs) has become the primary constraint for the advancement of artificial intelligence. In response, organizations are increasingly adopting strategies focused on dev…
-
AI era demands flexible data center investments, moving beyond old refresh cycles
The AI era is forcing a significant shift in data center infrastructure investments, moving away from traditional refresh cycles. Companies are now navigating multiple, often misaligned, technology lifecycles for comput…
-
Stanford researchers develop new hardware to efficiently process sparse AI models
Researchers at Stanford University have developed a novel hardware chip designed to efficiently process sparse AI models. Sparsity, where most AI model parameters are zero, offers significant computational savings but i…
-
AHASD architecture boosts LLM speculative decoding on mobile devices
Researchers have developed AHASD, a novel asynchronous heterogeneous architecture designed to optimize large language model (LLM) inference on mobile devices. This architecture employs task-level decoupling for parallel…
-
DeepSeek V4 prioritizes batch invariance, sacrificing GPU efficiency for stability
DeepSeek V4's technical report reveals a core design choice of "batch invariance" to ensure consistent outputs across different batch configurations and processing pipelines. This feature is crucial for maintaining repr…
-
New GPU framework accelerates quantum state calculations for complex systems
Researchers have developed QiankunNet-cuSCI, a novel framework that fully accelerates the NNQS-SCI method for solving complex quantum systems using GPUs. This new approach addresses the scalability limitations of previo…