A new version of the Ornith-1.0-35B model, specifically the GGUF format, has been updated with a native Multi Token Prediction (MTP) speculative-decode graft. This update enhances single-stream decode speeds by 1.3-1.35x, achieving up to 233.8 tokens per second. The model maintains a low Kullback–Leibler divergence (KLD) of 0.073, which is better than the Q4_K_M quantization, and offers improved performance for long-context scenarios. AI
IMPACT Enhances local LLM performance and efficiency for users running models on consumer hardware.
RANK_REASON Update to an existing open-source model with performance improvements and new features.
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →