PulseAugur
EN
LIVE 03:30:16

Ornith 35B model enhanced with MTP for faster agentic coding

A developer has integrated Multi Token Prediction (MTP) into the Ornith 35B model, enhancing its performance for agentic coding tasks. This modification reportedly increases inference speed by 18% and achieves a 70% drafter acceptance rate. The optimized model, utilizing FP8 E4M3 quantization, is designed to run on hardware with over 80GB of VRAM and supports a 256k context window, with potential applications on unified memory systems. AI

IMPACT Potential for improved efficiency in agentic coding tasks for users with high-end hardware.

RANK_REASON Developer-led integration of a new feature into an existing open-source model.

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Ornith 35B model enhanced with MTP for faster agentic coding

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 English(EN) · /u/kyr0x0 ·

    I added MTP to local SoTA Agentic Coding Model Ornith 35B FP8 E4M3

    <table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1ul3rr2/i_added_mtp_to_local_sota_agentic_coding_model/"> <img alt="I added MTP to local SoTA Agentic Coding Model Ornith 35B FP8 E4M3" src="https://external-preview.redd.it/357bFLl1YnO40h90lyieTGgrUlozMtIn4Et…