Drop
PulseAugur coverage of Drop — every cluster mentioning Drop across labs, papers, and developer communities, ranked by signal.
-
HRM-Text model drastically cuts LLM pretraining costs
Researchers have developed HRM-Text, a novel Hierarchical Recurrent Model that significantly reduces the computational resources and training data required for pretraining large language models. By decoupling computatio…
-
Researchers find Transformers know counts but struggle to output them
A new paper identifies a specific bottleneck in Transformer models that hinders their ability to perform counting tasks. Researchers found that while models like Pythia, Qwen3, and Mistral store count information accura…
-
Google DeepMind releases T5Gemma encoder-decoder LLMs adapted from Gemma
Google DeepMind has introduced T5Gemma, a new family of encoder-decoder large language models derived from their existing Gemma 2 models. This adaptation technique allows for flexible combinations of encoder and decoder…