<h1>Release v5.9.0</h1> <h2>New Model additions</h2> <h3>Cohere2Moe</h3> <p>Command A+ is a Mixture-of-Experts (MoE) language model from Cohere that features a hybrid attention pattern combining sliding window and full attention layers. The model incorporates both shared and rout…
<h1>Patch release v5.8.1</h1> <p>This release is mainly to fix the Deepseek V4 integration!!!</p> <a href="https://private-user-images.githubusercontent.com/48595927/591488772-0d85e891-a0ff-436e-a9d4-b6633096f2b5.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY2…
<h1>Patch release v5.6.2</h1> <p>Qwen 3.5 and 3.6 MoE (text-only) were broken when using with FP8. It should now work again with this 🫡</p> <ul> <li>Fix configuration reading and error handling for kernels (<a class="issue-link js-issue-link" href="https://github.com/huggingface/…
<h1>Patch release v5.6.1</h1> <p>Flash attention path was broken! Sorry everyone for this one 🤗</p> <ul> <li>Fix AttributeError on s_aux=None in flash_attention_forward (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45589">#45589</a>) …
<h1>Release v5.6.0</h1> <h2>New Model additions</h2> <h3>OpenAI Privacy Filter</h3> <p>OpenAI Privacy Filter is a bidirectional token-classification model for personally identifiable information (PII) detection and masking in text. It is intended for high-throughput data sanitiza…
<h1>Patch release v5.5.4</h1> <p>This is mostly some fixes that are good to have asap, mostly for tokenizers;<br /> ** Fix Kimi-K2.5 tokenizer regression and _patch_mistral_regex Attribute… (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/iss…
<p>Small patch release to fix <code>device_map</code> support for Gemma4! It contains the following commit:</p> <ul> <li>[gemma4] Fix device map auto (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45347">#45347</a>) by <a class="user-m…
<p>Small patch dedicated to optimizing gemma4, fixing inference with <code>use_cache=False</code> due to k/v states sharing between layers, as well as conversion mappings for some models that would inconsistently serialize their weight names. It contains the following PRs:</p> <u…