Release v5.11.0
The Hugging Face Transformers library has released version 5.12.0, introducing new models like MiniMax-M3-VL, a vision-language model with a CLIP-style vision tower and a sparse Mixture-of-Experts decoder. This update also includes improvements to PP-OCRv6, an efficient OCR system, and Parakeet-RNNT, a fast conformer encoder with an RNN-T decoder. Additionally, version 5.11.0 added DiffusionGemma, an encoder-decoder model for faster text generation, and DeepSeek-V3.2-Exp, which features a novel sparse attention mechanism for long-context efficiency. AI