QATs Q4_0 from Google have more precision than Q4_K_XL from Unsloth (at least some)
A user on r/LocalLLaMA has observed that Google's QATs (Quantized Aware Training) Q4_0 models appear to have more precision than Unsloth's Q4_K_XL variants, contrary to expectations. This observation is based on file sizes and tensor quantities, where Google's Q4_0 models are sometimes larger than Unsloth's Q4_K_XL, suggesting a difference in quantization strategy or implementation. The user is seeking clarification on why this discrepancy occurs and how to properly analyze tensor data within GGUF files. AI
IMPACT This comparison highlights potential differences in quantization techniques, impacting model performance and size for local LLM deployments.