ENTITY Cuda

Cuda

PulseAugur coverage of Cuda — every cluster mentioning Cuda across labs, papers, and developer communities, ranked by signal.

Total · 30d

38

38 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

16

16 over 90d

TIER MIX · 90D

significant 3
research 13
tool 20
commentary 2

RELATIONSHIPS

SENTIMENT · 30D

3 day(s) with sentiment data

RECENT · PAGE 2/2 · 34 TOTAL

RESEARCH · CL_07063 · Apr 28 · 04:00

New GPU framework accelerates quantum state calculations for complex systems

Researchers have developed QiankunNet-cuSCI, a novel framework that fully accelerates the NNQS-SCI method for solving complex quantum systems using GPUs. This new approach addresses the scalability limitations of previo…
RESEARCH · CL_10487 · Apr 28 · 01:11

AMD's MI300X falls short of Nvidia in AI training due to software issues

A recent benchmark analysis by SemiAnalysis found that AMD's MI300X GPU, despite theoretical advantages in specifications and total cost of ownership, does not compete effectively with Nvidia's H100 and H200 in training…
RESEARCH · CL_06196 · Apr 27 · 08:24

PointTransformerX offers portable, efficient 3D point cloud processing without sparse algorithms

Researchers have developed PointTransformerX (PTX), a new vision transformer backbone for processing 3D point clouds that eliminates the need for custom CUDA operators. This PyTorch-native model achieves competitive acc…
RESEARCH · CL_03577 · Apr 25 · 15:42

llama.cpp and ik_llama.cpp add FP4 inference support for VRAM savings

The llama.cpp and ik_llama.cpp projects have both integrated support for FP4 (4-bit floating-point) inference, a significant advancement for model quantization. llama.cpp now includes NVFP4, an Nvidia-specific format, w…
TOOL · CL_03576 · Apr 25 · 14:22

llama.cpp CUDA pull request optimizes MMQ stream-k overhead for MoE models

A pull request to the llama.cpp project aims to reduce overhead in CUDA's MMQ stream-k operations. This optimization targets Mixture of Experts (MoE) models, potentially leading to faster prompt processing speeds. The c…
FRONTIER RELEASE · CL_03105 · Apr 25 · 05:00

DeepSeek releases V4 Pro and Flash models with 1M context, runs on Huawei chips

DeepSeek has released its new V4 family of models, including V4 Pro and V4 Flash, which boast a 1 million token context window. These models were trained on 32 trillion tokens and feature a novel hybrid attention system…
SIGNIFICANT · CL_05791 · Apr 13 · 04:56

TianShu Zhixin cuts inference chip prices to gain market share amid revenue concerns

Chinese AI chip designer Tianshu Zhixin reported 10.34 billion yuan in revenue for 2025, a 91.6% year-over-year increase, though this fell short of market expectations. The company's training chip series, "Tianhe," rema…
FRONTIER RELEASE · CL_05793 · Apr 13 · 01:34

DeepSeek V4 to launch late April with trillion parameters, Huawei Ascend chip support

DeepSeek founder Liang Wenfeng has revealed that the company's next-generation flagship model, DeepSeek V4, is slated for release in late April. This new model is expected to feature trillion-scale parameters and a mill…
TOOL · CL_18066 · Mar 7 · 00:05

AI coding assistants like Claude reignite passion for older developers

Several older developers are finding renewed passion for coding due to AI coding assistants like Claude Code. These tools allow them to focus on architectural design and problem-solving without getting bogged down in th…
TOOL · CL_17743 · Jul 29 · 23:32

PHP-ORT brings machine learning inference to PHP developers

A new infrastructure project called PHP-ORT aims to bring machine learning inference capabilities directly to PHP, the server-side language used by a significant portion of the web. This development seeks to empower mil…
TOOL · CL_17711 · May 12 · 16:01

ParaQuery launches GPU-accelerated Spark SQL for cost-efficient data processing

ParaQuery, a new startup, has launched a GPU-accelerated Spark and SQL data processing solution. The platform aims to offer cost and performance benefits over existing solutions like Google BigQuery. ParaQuery leverages…
TOOL · CL_17783 · Jun 5 · 12:17

NetHack ML model performance plummets 40% due to mysterious bug

Researchers Bartłomiej Cupiał and Maciej Wołczyk observed a significant performance drop in their neural network trained to play NetHack. The model, which had been consistently scoring around 5,000 points, suddenly bega…
SIGNIFICANT · CL_00880 · Jun 20 · 15:34

George Hotz's tiny corp unveils $15K AI computer and RISC-based tinygrad framework

George Hotz's company, tiny corp, has launched the tinybox, a $15,000 personal AI computer designed for local model training and inference. The tinybox boasts 738 FP16 TFLOPS and 144 GB of GPU RAM, capable of running a …
COMMENTARY · CL_04729 · Jan 24 · 00:00

Eugene Yan: MOOCs offer diminishing returns; real learning comes from doing

Eugene Yan argues that while Massive Open Online Courses (MOOCs) can be useful for initial learning, they often lead to diminishing returns and can even become a form of procrastination. He suggests that true learning, …