A comprehensive dataset has been compiled and released on GitHub, detailing which local Large Language Models (LLMs) can run on various RAM tiers, ranging from 8GB to 128GB. The dataset provides a rule of thumb that a model at Q4_K_M quantization requires approximately 0.6GB of memory per billion parameters, and users should aim to utilize about 70% of their available RAM or VRAM to accommodate the operating system, context, and KV cache. This resource includes specific model details, quantization levels, load sizes, and command-line instructions for running them, with a focus on Apple Silicon and consumer NVIDIA hardware. AI
IMPACT Provides crucial data for users to select and run LLMs on consumer hardware, lowering the barrier to entry for local AI model deployment.
RANK_REASON Dataset release detailing hardware requirements for local LLMs. [lever_c_demoted from research: ic=1 ai=0.7]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →