ICYM: llama.cpp b9455 --SM Tensor KV Cache Fix is MERGED
The llama.cpp project has merged a significant fix (b9455) that resolves issues with the KV cache when using the --sm tensor flag on multi-GPU setups. This update, developed by Johannes Gaessler, ensures that shape information is preserved during tensor flattening, allowing the meta backend to correctly handle the KV cache rotation. The fix avoids undesirable workarounds by extending the meta backend's capabilities rather than altering the compute graphs. AI
IMPACT Improves performance and stability for users running LLMs locally on multi-GPU configurations.