graphics processing unit
PulseAugur coverage of graphics processing unit — every cluster mentioning graphics processing unit across labs, papers, and developer communities, ranked by signal.
- used by Vulkan 90%
- used by Triton 90%
- used by central processing unit 70%
- competes with Tensor Processing Unit 70%
- competes with application-specific integrated circuit 70%
- competes with Apple Neural Engine 70%
- instance of high-performance computing 70%
- used by AI inference 70%
- used by H.1000 Gnome 70%
- used by Innu-aimun 70%
- competes with Cerebras Systems 70%
- used by SemiAnalysis 70%
29 day(s) with sentiment data
-
GPU hardware analysis reveals memory bandwidth, not FLOPS, is key for LLMs
This article explains the fundamental architecture of GPUs, focusing on how their design prioritizes memory bandwidth over raw computational power for machine learning tasks. It details how GPUs manage thousands of thre…
-
Datacenter AI clusters rely on Indium Phosphide for laser chips
Indium Phosphide (InP) is a critical semiconductor material used in datacenter laser chips and optical transceivers that connect GPUs in AI clusters. Its unique crystal lattice allows for the growth of alloys that emit …
-
Astera Labs launches new fabric switch to boost AI workload efficiency
Astera Labs has introduced a new smart fabric switch, the Scorpio X-Series, designed to address inefficiencies in AI infrastructure. This new hardware aims to reduce coordination overhead and improve accelerator utiliza…
-
India's Krutrim pivots to cloud amid GPU woes; new tech tackles RAG hallucinations
India's first generative AI unicorn, Krutrim, is shifting its focus from developing sovereign AI models to offering cloud services by 2026. This pivot is driven by the economic realities and significant GPU shortages im…
-
Graph Neural Networks accelerate VLSI design with faster capacitance modeling
Researchers have developed GNN-Ceff, a novel method utilizing Graph Neural Networks for post-layout effective capacitance modeling in VLSI design. This approach aims to improve the accuracy and speed of static timing an…
-
SwiftChannel framework co-designs AI hardware for faster 5G channel estimation
Researchers have developed SwiftChannel, a novel algorithm-hardware co-design framework for deep learning-based 5G channel estimation. This framework integrates a hardware-friendly convolutional neural network with a de…
-
SURGE system optimizes GPU encoding for large-scale text embedding generation
Researchers have developed SURGE, a new system designed to improve the efficiency of generating text embeddings on GPUs. SURGE addresses the bottleneck of processing numerous small data partitions by employing a streami…
-
New CUDA implementation speeds up optimal transport calculations on GPUs
Researchers have developed FastSinkhorn, a new CUDA implementation for the Sinkhorn algorithm used in optimal transport computations. This method operates entirely in the log-domain, ensuring numerical stability even wi…
-
New SPES framework enables memory-efficient decentralized LLM pretraining on fewer GPUs
Researchers have developed a novel decentralized framework called SPES for pretraining large language models, specifically Mixture-of-Experts (MoE) architectures. This method significantly reduces memory requirements by…
-
New HERMES and DSCache methods improve streaming video understanding with KV cache
Researchers have developed new methods to improve the efficiency of multimodal large language models (MLLMs) for understanding streaming video. One approach, HERMES, conceptualizes the KV cache as a hierarchical memory …
-
Zyphra's TSP strategy boosts LLM training throughput by 2.6x
Zyphra has developed a new technique called Tensor and Sequence Parallelism (TSP) designed to optimize the training and inference of large transformer models. This hardware-aware strategy combines aspects of Tensor Para…
-
NVIDIA cuOpt and OpenAI achieve breakthroughs in supply chain and voice AI
NVIDIA is enhancing supply chain decision systems with its cuOpt technology, which combines agentic AI with GPU acceleration for real-time, large-scale planning. Separately, OpenAI has achieved low-latency voice AI, del…
-
AWS SageMaker adds automatic instance fallback for AI endpoints
Amazon SageMaker has introduced a new feature called capacity-aware instance pools for AI inference endpoints. This enhancement allows users to define a prioritized list of instance types, enabling SageMaker to automati…
-
Coral and CoRAL systems optimize LLM serving and robotic control
Researchers have developed two distinct systems named Coral and CoRAL. Coral is an adaptive system designed for cost-efficient serving of multiple large language models across heterogeneous cloud GPUs, aiming to optimiz…
-
Mastodon users criticize energy consumption of AI hardware
The user is expressing frustration about the energy consumption associated with specialized hardware, drawing a parallel to the cryptocurrency industry. They note that ASICs have largely replaced GPUs in certain applica…
-
ODMs transition from manufacturing to AI infrastructure partners for complex racks
Original Design Manufacturers (ODMs) are transitioning from traditional hardware production to becoming key partners in AI infrastructure. This evolution is spurred by the increasing complexity of AI hardware, particula…
-
GitHub tool measures GPU 'useful' work amid AI and security buzz
A new GitHub tool called Utilyze has been released, designed to monitor GPU performance for "useful" work. The tool aims to track computational tasks beyond entertainment, incorporating buzzwords like AI, workflow autom…
-
Sasha Rush releases Autodiff Puzzles to teach automatic differentiation
Sasha Rush has released "Autodiff Puzzles," an interactive Google Colab notebook designed to teach automatic differentiation. Similar to his previous puzzle series on Tensors and GPUs, these challenges guide users throu…
-
Next-gen chips promise data centers greater efficiency and AI power
Next-generation chip designs, including those optimized for AI, energy efficiency, and heat tolerance, have the potential to significantly alter data center infrastructure. Innovations in packaging, memory, and offload …
-
Datavault AI raises $120M to build nationwide GPU network for AI compute
Datavault AI has secured $120 million in funding from Scilex Holding to establish a nationwide GPU network. This initiative aims to provide increased computing power for companies engaged in artificial intelligence deve…