实体 graphics processing unit

graphics processing unit

PulseAugur coverage of graphics processing unit — every cluster mentioning graphics processing unit across labs, papers, and developer communities, ranked by signal.

Show in brief

总计 · 30天

134

90 天内 134

发布 · 30天

90 天内 0

论文 · 30天

90 天内 48

层级分布 · 90 天

significant 9
research 35
tool 64
commentary 23
meme 3

关系

情绪 · 30 天

18 天有情绪数据

最近 · 第 4/7 页 · 共 134 条

RESEARCH · CL_23761 · May 6 · 17:45

Modal boosts multimodal inference performance over 10% with Python dict

Modal has identified a performance bottleneck in multimodal inference engines like SGLang, which can hinder GPU utilization. By profiling the scheduler, they discovered that expensive bookkeeping for shared GPU memory c…
RESEARCH · CL_20462 · May 6 · 14:18

New benchmark reveals LLM-generated GPU kernels struggle with correctness and efficiency

A new benchmark called KernelBench-X has been developed to evaluate the capabilities of large language models in generating GPU kernels. The benchmark, which covers 176 tasks across 15 categories, reveals that task stru…
TOOL · CL_19446 · May 6 · 13:58

AMD EPYC CPUs show competitive performance for LLM and TTS inference workloads

A recent analysis by Leaseweb benchmarks the performance of AMD EPYC 9334 CPUs for Large Language Model (LLM) and Text-to-Speech (TTS) inference workloads. The study reveals that while GPUs offer higher throughput, CPUs…
TOOL · CL_19402 · May 6 · 12:56

AI assists in developing Pascal version of LAPACK, aiming for GPU acceleration

A user on Mastodon is collaborating with GitHub Copilot to develop a Pascal version of the LAPACK numerical library, which is approximately 30% complete. They anticipate reaching 80% completion within two days and plan …
RESEARCH · CL_20517 · May 6 · 10:02

New tool cuts GPU memory use in AI training by optimizing optimizer states

Researchers have developed a Budget-Aware Optimizer Configurator (BAOC) to address the significant GPU memory consumption during large-scale model training. BAOC intelligently assigns different optimizer configurations …
TOOL · CL_19074 · May 6 · 09:22

AI image generation: CPU vs GPU performance and scaling insights

This article explores the performance differences between CPUs and GPUs when generating AI-created images and videos. The author shares their experience using these components for digital art creation, highlighting that…
RESEARCH · CL_19066 · May 6 · 08:30

Memory giants push new MRDIMM standard for AI, HPC servers

Major memory manufacturers Samsung Electronics, SK Hynix, and Micron are nearing completion of the next-generation server DRAM module standard, MRDIMM. This new standard is optimized for AI and high-performance computin…
TOOL · CL_18835 · May 6 · 04:00

New Polar Express method accelerates matrix decomposition for deep learning

Researchers have developed a new GPU-friendly algorithm called Polar Express for computing matrix decompositions, which is crucial for the Muon optimizer used in training deep neural networks. This method optimizes for …
TOOL · CL_18603 · May 6 · 04:00

VUDA system enables spatial sharing of compute and graphics on GPUs

Researchers have developed VUDA, a system designed to enhance GPU utilization by enabling simultaneous execution of CUDA compute and Vulkan graphics workloads. This is achieved by breaking down the isolation between the…
RESEARCH · CL_18441 · May 6 · 03:49

Lumentum CEO: AI component demand outstrips supply, orders booked until 2028

Lumentum, a major US optical module manufacturer, reported a record-breaking third fiscal quarter with revenue soaring 90% year-over-year to $808 million. The company also saw significant improvements in profitability, …
RESEARCH · CL_18429 · May 6 · 03:15

AI boom creates volatile market for video game hardware

The burgeoning AI industry is creating unprecedented demand for high-end graphics cards, significantly impacting the video game hardware market. This surge in demand is leading to shortages and price increases for GPUs,…
TOOL · CL_18041 · May 5 · 22:01

GPU hardware analysis reveals memory bandwidth, not FLOPS, is key for LLMs

This article explains the fundamental architecture of GPUs, focusing on how their design prioritizes memory bandwidth over raw computational power for machine learning tasks. It details how GPUs manage thousands of thre…
SIGNIFICANT · CL_17945 · May 5 · 21:00

Datacenter AI clusters rely on Indium Phosphide for laser chips

Indium Phosphide (InP) is a critical semiconductor material used in datacenter laser chips and optical transceivers that connect GPUs in AI clusters. Its unique crystal lattice allows for the growth of alloys that emit …
RESEARCH · CL_17304 · May 5 · 20:05

Astera Labs launches new fabric switch to boost AI workload efficiency

Astera Labs has introduced a new smart fabric switch, the Scorpio X-Series, designed to address inefficiencies in AI infrastructure. This new hardware aims to reduce coordination overhead and improve accelerator utiliza…
SIGNIFICANT · CL_16724 · May 5 · 13:42

India's Krutrim pivots to cloud amid GPU woes; new tech tackles RAG hallucinations

India's first generative AI unicorn, Krutrim, is shifting its focus from developing sovereign AI models to offering cloud services by 2026. This pivot is driven by the economic realities and significant GPU shortages im…
TOOL · CL_16219 · May 5 · 04:00

Graph Neural Networks accelerate VLSI design with faster capacitance modeling

Researchers have developed GNN-Ceff, a novel method utilizing Graph Neural Networks for post-layout effective capacitance modeling in VLSI design. This approach aims to improve the accuracy and speed of static timing an…
TOOL · CL_16179 · May 5 · 04:00

SwiftChannel framework co-designs AI hardware for faster 5G channel estimation

Researchers have developed SwiftChannel, a novel algorithm-hardware co-design framework for deep learning-based 5G channel estimation. This framework integrates a hardware-friendly convolutional neural network with a de…
TOOL · CL_16155 · May 5 · 04:00

SURGE system optimizes GPU encoding for large-scale text embedding generation

Researchers have developed SURGE, a new system designed to improve the efficiency of generating text embeddings on GPUs. SURGE addresses the bottleneck of processing numerous small data partitions by employing a streami…
TOOL · CL_16004 · May 5 · 04:00

New CUDA implementation speeds up optimal transport calculations on GPUs

Researchers have developed FastSinkhorn, a new CUDA implementation for the Sinkhorn algorithm used in optimal transport computations. This method operates entirely in the log-domain, ensuring numerical stability even wi…
TOOL · CL_15971 · May 5 · 04:00

New SPES framework enables memory-efficient decentralized LLM pretraining on fewer GPUs

Researchers have developed a novel decentralized framework called SPES for pretraining large language models, specifically Mixture-of-Experts (MoE) architectures. This method significantly reduces memory requirements by…

Modal boosts multimodal inference performance over 10% with Python dict

New benchmark reveals LLM-generated GPU kernels struggle with correctness and efficiency

AMD EPYC CPUs show competitive performance for LLM and TTS inference workloads

AI assists in developing Pascal version of LAPACK, aiming for GPU acceleration

New tool cuts GPU memory use in AI training by optimizing optimizer states

AI image generation: CPU vs GPU performance and scaling insights

Memory giants push new MRDIMM standard for AI, HPC servers

New Polar Express method accelerates matrix decomposition for deep learning

VUDA system enables spatial sharing of compute and graphics on GPUs

Lumentum CEO: AI component demand outstrips supply, orders booked until 2028

AI boom creates volatile market for video game hardware

GPU hardware analysis reveals memory bandwidth, not FLOPS, is key for LLMs

Datacenter AI clusters rely on Indium Phosphide for laser chips

Astera Labs launches new fabric switch to boost AI workload efficiency

India's Krutrim pivots to cloud amid GPU woes; new tech tackles RAG hallucinations

Graph Neural Networks accelerate VLSI design with faster capacitance modeling

SwiftChannel framework co-designs AI hardware for faster 5G channel estimation

SURGE system optimizes GPU encoding for large-scale text embedding generation

New CUDA implementation speeds up optimal transport calculations on GPUs

New SPES framework enables memory-efficient decentralized LLM pretraining on fewer GPUs