SmolLM2-135M
PulseAugur coverage of SmolLM2-135M — every cluster mentioning SmolLM2-135M across labs, papers, and developer communities, ranked by signal.
5 day(s) with sentiment data
-
Small language model trained on single GPU detailed in new study
Researchers have detailed a method for training a small language model, L20-Edu-135M, using significantly fewer computational resources, specifically on a single NVIDIA L20 GPU. The study focused on data efficiency, uti…
-
LLM Tutorial Explains Foundation, Instruct, and Chat Model Differences
A tutorial demonstrates the distinctions between foundation, instruct, and chat models in large language models. It uses the SmolLM2-135M family, runnable on Google Colab without a GPU, to illustrate how models evolve f…
-
Compact Bangla LLM Outperforms Larger Models with Efficient Design
Researchers have developed a new compact language model, bangla-smollm-135m, specifically designed for the Bangla language. This 135-million parameter model achieves competitive performance against larger models by empl…
-
AutoMegaKernel compiles Llama models into single CUDA kernels
Researchers have developed AutoMegaKernel (AMK), a system that compiles HuggingFace Llama-family models into a single, persistent CUDA kernel for efficient forward passes. AMK's static validator ensures schedule safety,…
-
Jetson Orin Nano benchmarks 8 tiny LLMs across power modes
A benchmark of eight small language models (135M to ~1B parameters) was conducted on a Jetson Orin Nano Super 8GB device. The tests explored four power modes (7W, 15W, 25W, MAXN) using the llama.cpp CUDA backend. The fi…
-
FlashNorm speeds up transformer inference by optimizing normalization layers
Researchers have developed FlashNorm, a technique to accelerate normalization layers in Transformer models. By reformulating RMSNorm and folding its weights into subsequent linear layers, FlashNorm enables parallel exec…