llama.cpp tensor split mode causes CUDA error with Qwen model

By PulseAugur Editorial · [1 sources] · 2026-06-03 12:38

A user encountered a CUDA error when attempting to load a Qwen-3.6-27b model with tensor split mode enabled in the latest version of llama.cpp. The error message indicates that the `llama_params_fit` function is not implemented for tensor split mode, leading to a failure in fitting parameters to device memory. This issue occurred on a system with dual 3090 GPUs running Ubuntu Server 24.04 and CUDA 13.0. AI

IMPACT This issue highlights potential compatibility problems when using advanced features like tensor split mode with specific model quantizations and hardware setups in local LLM deployments.

RANK_REASON User-reported technical issue with open-source software and hardware configuration. [lever_c_demoted from research: ic=1 ai=0.7]

Read on r/LocalLLaMA →

infra
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

r/LocalLLaMA TIER_1 English(EN) · /u/Blues520 · 2026-06-03 12:38

Tensor split mode: CUDA error on latest llama.cpp with Qwen-3.6-27b

<div class="md"><p>Hi guys, I am running into issues when loading the Unsloth UD-Q8_K_XL quant and wanted to check if anyone has ran into this. I updated my config to also use --split-mode tensor but wanted to check if I need to update drivers/CUDA to get it workin…

COVERAGE [1]

Tensor split mode: CUDA error on latest llama.cpp with Qwen-3.6-27b

RELATED ENTITIES

RELATED TOPICS