Inception Labs has released Mercury 2, a novel diffusion model capable of reasoning and generating text at over 1000 tokens per second. This model achieves significant speed improvements by utilizing parallel crystallization instead of linear writing, resulting in an 82% reduction in latency compared to traditional models. Mercury 2 also outperforms Google's DiffusionGemma in benchmark tests, challenging conventional LLM architectures. AI
IMPACT Sets a new benchmark for LLM inference speed and challenges existing architectures with its parallel crystallization approach.
RANK_REASON Frontier-lab model release with system card.
Read on Mastodon — mastodon.social →
AI-generated summary · Google Gemini · from 3 sources. How we write summaries →