AMD MI300X
PulseAugur coverage of AMD MI300X — every cluster mentioning AMD MI300X across labs, papers, and developer communities, ranked by signal.
3 day(s) with sentiment data
-
MoonMath AI open-sources HIP attention kernel for AMD MI300X, beating AITER v3
MoonMath AI has open-sourced a new bf16 forward attention kernel for AMD's MI300X GPU, written in HIP. This kernel reportedly outperforms AMD's own AITER v3 across various configurations, achieving up to a 1.26x speedup…
-
MoonMath AI open-sources AMD MI300X attention kernel outperforming AITER v3 · 3 sources tracked
MoonMath AI has released an open-source HIP attention kernel for AMD's MI300X GPU, which reportedly outperforms AMD's own AITER v3. The kernel achieves speedups of up to 1.26x by optimizing memory placement and using on…
-
Qwen3 32B fine-tuning fails on AMD MI300X
A fine-tuning attempt of the Qwen3 32B model on AMD MI300X hardware encountered significant issues, leading to wasted resources and a lack of learning. The process reportedly consumed $10 in GPU credits before it was re…
-
Kog AI achieves 3,000 tokens/s LLM inference on standard GPUs
Kog AI has launched a tech preview of its Kog Inference Engine (KIE), demonstrating significantly faster real-time LLM inference speeds on standard datacenter GPUs. The engine achieves 3,000 output tokens per second on …
-
Triton MoE kernel achieves high performance on AMD, NVIDIA
A new fused Mixture-of-Experts (MoE) dispatch kernel, written entirely in Triton, achieves 89-131% of the performance of Stanford's Megablocks library. This kernel notably runs on AMD MI300X hardware without any code mo…
-
MachinaCheck uses specialized agents to ensure manufactured parts are machinable
MachinaCheck is a novel multi-agent system designed to bridge the gap between CAD design and CNC manufacturing, ensuring parts are machinable before production. This system utilizes specialized agents that parse geometr…
-
Zyphra's TSP strategy boosts LLM training throughput by 2.6x
Zyphra has developed a new technique called Tensor and Sequence Parallelism (TSP) designed to optimize the training and inference of large transformer models. This hardware-aware strategy combines aspects of Tensor Para…
-
MachinaCheck automates CNC manufacturability analysis using on-premise AI
A new system called MachinaCheck has been developed to automate the manufacturability assessment of CNC parts, reducing the process from an hour to 30 seconds. This multi-agent AI system leverages the Qwen 2.5 7B Instru…