The openPangu-2.0-Flash model is a new Mixture-of-Experts (MoE) architecture boasting 92 billion total parameters and activating 6 billion parameters. It supports a context length of 512k tokens and was trained on 34 trillion tokens. Key architectural improvements include efficient attention mechanisms combining local and global context, a novel residual topology for enhanced representation, multi-token prediction for faster inference, and the use of the Muon optimizer for training. AI
IMPACT This model's large context window and efficient attention mechanisms could enable new applications in long-form text analysis and generation.
RANK_REASON Frontier-lab model release with system card. [lever_c_demoted from frontier_release: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →