ENTITY AMD MI300X

AMD MI300X

PulseAugur coverage of AMD MI300X — every cluster mentioning AMD MI300X across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

8 over 90d

Releases · 30d

0 over 90d

Papers · 30d

1 over 90d

TIER MIX · 90D

TOPICS

SENTIMENT · 30D

3 day(s) with sentiment data

RECENT · PAGE 1/1 · 8 TOTAL

TOOL · CL_106546 · Jun 22 · 07:13

MoonMath AI open-sources HIP attention kernel for AMD MI300X, beating AITER v3

MoonMath AI has open-sourced a new bf16 forward attention kernel for AMD's MI300X GPU, written in HIP. This kernel reportedly outperforms AMD's own AITER v3 across various configurations, achieving up to a 1.26x speedup…
RESEARCH · CL_100348 · Jun 19 · 07:32

MoonMath AI open-sources AMD MI300X attention kernel outperforming AITER v3 · 3 sources tracked

MoonMath AI has released an open-source HIP attention kernel for AMD's MI300X GPU, which reportedly outperforms AMD's own AITER v3. The kernel achieves speedups of up to 1.26x by optimizing memory placement and using on…
TOOL · CL_91546 · Jun 15 · 06:51

Qwen3 32B fine-tuning fails on AMD MI300X

A fine-tuning attempt of the Qwen3 32B model on AMD MI300X hardware encountered significant issues, leading to wasted resources and a lack of learning. The process reportedly consumed $10 in GPU credits before it was re…
TOOL · CL_59358 · May 29 · 09:47

Kog AI achieves 3,000 tokens/s LLM inference on standard GPUs

Kog AI has launched a tech preview of its Kog Inference Engine (KIE), demonstrating significantly faster real-time LLM inference speeds on standard datacenter GPUs. The engine achieves 3,000 output tokens per second on …
TOOL · CL_54717 · May 27 · 12:58

Triton MoE kernel achieves high performance on AMD, NVIDIA

A new fused Mixture-of-Experts (MoE) dispatch kernel, written entirely in Triton, achieves 89-131% of the performance of Stanford's Megablocks library. This kernel notably runs on AMD MI300X hardware without any code mo…
TOOL · CL_26256 · May 11 · 09:03

MachinaCheck uses specialized agents to ensure manufactured parts are machinable

MachinaCheck is a novel multi-agent system designed to bridge the gap between CAD design and CNC manufacturing, ensuring parts are machinable before production. This system utilizes specialized agents that parse geometr…
RESEARCH · CL_15158 · May 4 · 23:15

Zyphra's TSP strategy boosts LLM training throughput by 2.6x

Zyphra has developed a new technique called Tensor and Sequence Parallelism (TSP) designed to optimize the training and inference of large transformer models. This hardware-aware strategy combines aspects of Tensor Para…
RESEARCH · CL_25306 · Dec 22 · 00:20

MachinaCheck automates CNC manufacturability analysis using on-premise AI

A new system called MachinaCheck has been developed to automate the manufacturability assessment of CNC parts, reducing the process from an hour to 30 seconds. This multi-agent AI system leverages the Qwen 2.5 7B Instru…

MoonMath AI open-sources HIP attention kernel for AMD MI300X, beating AITER v3

MoonMath AI open-sources AMD MI300X attention kernel outperforming AITER v3 · 3 sources tracked

Qwen3 32B fine-tuning fails on AMD MI300X

Kog AI achieves 3,000 tokens/s LLM inference on standard GPUs

Triton MoE kernel achieves high performance on AMD, NVIDIA

MachinaCheck uses specialized agents to ensure manufactured parts are machinable

Zyphra's TSP strategy boosts LLM training throughput by 2.6x

MachinaCheck automates CNC manufacturability analysis using on-premise AI