FlashAttention-2
PulseAugur coverage of FlashAttention-2 — every cluster mentioning FlashAttention-2 across labs, papers, and developer communities, ranked by signal.
4 day(s) with sentiment data
-
Picotron framework enables LLM training on older GPUs
A developer has created Picotron, an LLM training framework designed to run on older GPUs without crashing. This framework eliminates mandatory GPU-specific dependencies, allowing it to function on any GPU supporting Py…
-
New Causal-rCM recipe accelerates autoregressive video diffusion
Researchers have introduced Causal-rCM, a novel open recipe for autoregressive video diffusion distillation. This framework unifies teacher-forcing and self-forcing paradigms to enhance streaming video generation and in…
-
Subquadratic AI unveils SubQ 1.1 Small with 12M token context
Subquadratic AI has released its new model, SubQ 1.1 Small, which utilizes Smart Sparse Attention to achieve near-perfect long-context retrieval up to 12 million tokens. This model significantly reduces computational re…
-
SubQ unveils SubQ 1.1 Small with 12M-token context and sparse attention
SubQ has released its SubQ 1.1 Small model, featuring a new Subquadratic Sparse Attention (SSA) architecture designed to overcome the quadratic scaling limitations of traditional attention mechanisms. This new architect…
-
ByteDance releases Bernini open-source video generation framework
ByteDance has released Bernini, an open-source framework for video generation and editing. The system combines a multimodal large language model for semantic planning with a DiT-based renderer. Bernini reportedly achiev…
-
Stanford's ThunderKittens DSL optimizes AI kernel performance
A new article details ThunderKittens, a compact domain-specific language (DSL) developed at Stanford's Hazy Research Lab for creating high-performance AI kernels. The DSL aims to strike a balance between research produc…
-
Sigmoid attention improves biological foundation models with faster, stable training
Researchers have developed a new attention mechanism called Sigmoid Attention, which offers significant improvements for training biological foundation models. This novel approach leads to better learned representations…
-
Google AI optimizes cloud computing with LAVA, Together AI expands GPU cloud, and Modal streamlines AI/ML deployment
Google DeepMind researchers have developed LAVA, a new AI-driven scheduling algorithm designed to optimize resource allocation in cloud data centers. LAVA continuously re-predicts virtual machine (VM) lifetimes, adaptin…