Rocm
PulseAugur coverage of Rocm — every cluster mentioning Rocm across labs, papers, and developer communities, ranked by signal.
19 day(s) with sentiment data
-
LLMs haven't spurred competition against NVIDIA's CUDA, user asks why
The user questions why LLMs, despite their coding capabilities, haven't significantly accelerated the development of alternative software ecosystems like ROCm and Intel's stack to compete with NVIDIA's CUDA. They observ…
-
AMD Strix Halo NPUs Now Usable for LLM Inference with Lemonade Software
A new software development, Lemonade, has been released that enables the use of the Neural Processing Unit (NPU) on AMD Strix Halo devices for running large language models. This allows for hybrid models that leverage b…
-
Qualcomm acquires AI chip software firm Modular for $4B
Qualcomm is acquiring chip software startup Modular for nearly $4 billion in a deal that includes $300 million for Modular employees. This acquisition aims to bolster Qualcomm's expansion beyond mobile chips into areas …
-
Ideogram 4 LoRA training achieved on AMD Strix Halo with ROCm
A user successfully trained an Ideogram 4 face LoRA on an AMD Strix Halo APU using ROCm and the AI-Toolkit. The process involved several AMD-specific challenges, including the incompatibility of bitsandbytes, issues wit…
-
AMD ships ATOM + ATOMesh for ROCm LLM serving with disaggregation
AMD has released ATOM and ATOMesh, a new LLM serving stack designed for its Instinct GPUs and ROCm software. This stack introduces a technique called prefill/decode disaggregation, which separates the compute-intensive …
-
MoonMath AI open-sources AMD MI300X attention kernel outperforming AITER v3 · 3 sources tracked
MoonMath AI has released an open-source HIP attention kernel for AMD's MI300X GPU, which reportedly outperforms AMD's own AITER v3. The kernel achieves speedups of up to 1.26x by optimizing memory placement and using on…
-
Ideogram 4 LoRA training detailed for AMD hardware and style examples
Users are sharing their experiences and results training LoRAs for the Ideogram 4 model, a diffusion model praised for its open-source capabilities. One user detailed the process of training a face LoRA on an AMD Strix …
-
User asks about Linux/ROCm performance boost for AMD R9700
A user is inquiring about the potential performance gains of switching from a Windows-based AMD R9700 setup to Linux with ROCm for running Wan-2.2. They are seeking community experiences to determine if the effort is wo…
-
AMD launches $3999 mini-PC for local AI development
AMD has begun accepting pre-orders for its new "Ryzen AI Halo" development machine, priced at $3999 (approximately 640,000 JPY). This compact PC is designed to run large AI models, including those with up to 200 billion…
-
llama.cpp Releases Enhance Performance and Add New Features
The llama.cpp project has released several updates, including b9608, which features an update to cpp-httplib and provides pre-compiled binaries for various platforms like macOS, Linux, Android, and Windows. Release b960…
-
Step-3.7-Flash on AMD/ROCm faces context corruption and requires thinking budget
A user running the Step-3.7-Flash model on AMD hardware with ROCm has identified two key issues. First, ROCm appears to corrupt context windows beyond approximately 94,000 tokens, causing the model to loop and fail to p…
-
New tool visualizes NPU and iGPU activity on AMD Strix Halo
A new terminal monitoring tool called xdna-top has been released to help users visualize the activity of NPUs and iGPUs on AMD's Strix Halo processors. This tool addresses the current difficulty in tracking NPU performa…
-
User script boosts SDXL performance on older AMD GPUs
A user has developed a script to enable Stable Diffusion XL (SDXL) to run more efficiently on older AMD GPUs with 8GB of VRAM. The script bypasses the problematic DirectML backend on Windows, opting instead for native R…
-
AMD MI50 GPUs show strong performance with llama.cpp on Debian
A user on Reddit's r/LocalLLaMA shared performance benchmarks for AMD MI50 GPUs running the llama.cpp software on Debian Testing. The benchmarks, conducted using the llama-benchy tool with the unsloth/Qwen3.6-35B-A3B-GG…
-
BC250 device performance benchmarked with custom Llama-cpp setup
A user on Reddit shared performance metrics for a BC250 device running Fedora 44 with a customized Llama-cpp setup. The user detailed their process of overclocking the device to 2Ghz and unlocking 40 Compute Units, whic…
-
Unsloth Studio adds Gemma 4 12B, new UI, and live tools
Unsloth has released a beta update (v0.1.44-beta) that includes a new chat UI, project management features, and experimental canvas capabilities. This update also integrates Google's Gemma 4 12B model, which can run loc…
-
ROCm vs CUDA: Choosing the Right AI Development Platform
This article compares ROCm and CUDA, two prominent platforms for AI development. It details the author's personal experience attempting to train a PyTorch model on an AMD GPU using ROCm, highlighting the challenges enco…
-
AMD ROCm adds improved Linux support for Windows Subsystem for Linux 2
AMD's ROCm platform now offers improved support for Windows Subsystem for Linux 2 (WSL2), enabling users to run Linux-based AI workloads more effectively on Windows. While this update brings the system closer to a stabl…
-
Mistral.rs boosts CUDA inference speed; non-CUDA status debated
The mistral.rs project has released version 0.8.2, significantly improving CUDA inference speeds by up to 2.8 times compared to llama.cpp on various NVIDIA GPUs. This update focuses on optimizing throughput for models l…
-
Ollama v0.30.0-rc32 improves multi-GPU support and embeddings API
Ollama has released a release candidate version v0.30.0-rc32, which includes several follow-up fixes and improvements for its llama-server functionality. These updates address issues with ROCm build flags for multi-GPU …