MiniMax has released its M3 model, which reportedly uses significantly less compute per token compared to its predecessor. The company claims the new model is nine times faster during prefill and fifteen times faster during decoding, while also supporting a context window of one million tokens. AI
IMPACT This release suggests significant efficiency gains in LLM inference, potentially lowering costs and enabling new applications with larger context windows.
RANK_REASON Frontier-lab model release with system card. [lever_c_demoted from frontier_release: ic=1 ai=1.0]
Read on Mastodon — mastodon.social →
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →