v0.30.0-rc32: llama-server followups (#16353)
Ollama has released a release candidate version v0.30.0-rc32, which includes several follow-up fixes and improvements for its llama-server functionality. These updates address issues with ROCm build flags for multi-GPU support on Windows, improve version detection for AMD HIP, and ensure consistent behavior for the embeddings API. Additionally, the release optimizes batch sizes for constrained VRAM and fixes a loading bug for v3 models in Imagegen, while also enhancing the model reloading process for embeddings. AI
IMPACT Enhances local LLM management tools with improved multi-GPU support and API consistency.