We wrapped a live session on M3 yesterday with the @togethercompute team & our researchers @zpysky1125 and @HaohaiSun
MiniMax AI recently held a live session discussing their M3 model, highlighting the MiniMax Sparse Attention (MSA) mechanism. Unlike other attention methods that compress the KV cache, MSA preserves the uncompressed KV cache. This approach was developed in collaboration with the Together AI team. AI
IMPACT Highlights a novel attention mechanism that could improve model efficiency and performance.