Jetson AGX Orin 64GB: q8_0 good, q6_k bad
A user on the r/LocalLLaMA subreddit shared performance observations for the Jetson AGX Orin 64GB, noting that the q8_0 quantization method for models resulted in significantly faster prompt processing compared to q6_k and q4_k_xl. The user tested this with the Unsloth Qwen3.6-27B-MTP-GGUF model on a recent llama.cpp build, observing over 20% speed improvement with q8_0. They hypothesize that the Jetson's CUDA cores may not be well-optimized for lower quantization levels on this specific hardware, as memory bandwidth does not appear to be the limiting factor. AI
IMPACT Performance insights for running large language models on edge devices like the Jetson AGX Orin.