Pythia 70M
PulseAugur coverage of Pythia 70M — every cluster mentioning Pythia 70M across labs, papers, and developer communities, ranked by signal.
1 day(s) with sentiment data
-
Weibull framework reveals AdamW training dynamics in transformers
A new research paper explores the evolution of weight-scale parameters in transformer models during AdamW training. The study derives a three-force decomposition of the squared weight norm, identifying alignment, inject…
-
Researchers find independently trained transformers compute same function via random rotation
Researchers have discovered a phenomenon called "polymorphism" in independently trained transformers, where they compute the same function but use different internal coordinate systems that are rotated versions of each …
-
New methods enhance sparse autoencoder interpretability and stability
Researchers have developed new methods to address limitations in sparse autoencoders (SAEs), which are used to interpret the internal representations of large language models. One paper introduces adaptive elastic net S…