ENTITY GEMM

GEMM

PulseAugur coverage of GEMM — every cluster mentioning GEMM across labs, papers, and developer communities, ranked by signal.

Total · 30d

6

6 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

4

4 over 90d

TIER MIX · 90D

TOPICS

SENTIMENT · 30D

2 day(s) with sentiment data

RECENT · PAGE 1/1 · 6 TOTAL

RESEARCH · CL_112712 · Jun 23 · 11:38

New book details modern GPU programming for AI workloads

A new book titled "Modern GPU Programming for MLSys" aims to demystify high-performance GPU kernel development for machine learning systems. The book, originating from Carnegie Mellon University's Machine Learning Syste…
TOOL · CL_86852 · Jun 12 · 04:00

Apple M4 Max GPU's Tensor Compute Path Emulated, Not Accelerated

Researchers have reverse-engineered the Metal 4.1 tensor compute path on Apple's M4 Max GPU, revealing that the fp8 matmul2d operation is emulated rather than hardware-accelerated. This means the operation runs on the G…
TOOL · CL_53815 · May 27 · 04:00

New framework enhances LLM-generated Verilog with feedback and skill evolution

Researchers have developed Verilog-Evolve, a novel framework designed to enhance the generation of Verilog code using large language models. This system moves beyond isolated sampling and functional checking by incorpor…
TOOL · CL_51969 · May 26 · 08:50

TileLang simplifies GPU kernel writing with Python interface

A new programming language called TileLang aims to simplify GPU kernel development by offering a middle ground between high-level frameworks like Triton and low-level control like CUTLASS. TileLang allows developers to …
RESEARCH · CL_26186 · May 11 · 08:36

Sakana AI, NVIDIA unveil TwELL for faster LLM training and inference

Researchers from Sakana AI and NVIDIA have developed TwELL, a novel method that significantly speeds up large language model (LLM) operations. By targeting the feedforward layers, which are computationally intensive, Tw…
RESEARCH · CL_14208 · May 1 · 09:28

Tempus framework offers scalable, resource-efficient GEMM for edge AI

Researchers have developed Tempus, a new framework designed to optimize General Matrix Multiplication (GEMM) for edge AI deployments on AMD Versal SoCs. Unlike existing spatial scaling methods that fail on resource-cons…