Lemonade v10.8: auto memory management, cloud offload, Omni improvements, and call your local models as MCP tools
Lemonade has released version 10.8, featuring significant improvements in memory and context management for local AI models. This update introduces dynamic VRAM management that automatically unloads idle models and resizes KV-caches to optimize GPU memory, alongside automatic context sizing based on available memory and model architecture. The release also expands cloud offload capabilities, allowing users to integrate OpenAI-compatible providers alongside local models for enhanced flexibility. Additionally, Lemonade 10.8 enhances its LMX-Omni image generation features and introduces an MCP gateway, enabling local models to function as tools for various tasks like chat, transcription, and image generation. AI
IMPACT Enhances local AI model usability and integration with cloud services, potentially streamlining workflows for AI developers.