ENTITY SmolLM2-135M

SmolLM2-135M

PulseAugur coverage of SmolLM2-135M — every cluster mentioning SmolLM2-135M across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

6 over 90d

Releases · 30d

0 over 90d

Papers · 30d

6 over 90d

TIER MIX · 90D

TOPICS

SENTIMENT · 30D

5 day(s) with sentiment data

RECENT · PAGE 1/1 · 6 TOTAL

TOOL · CL_104732 · Jun 20 · 18:42

Small language model trained on single GPU detailed in new study

Researchers have detailed a method for training a small language model, L20-Edu-135M, using significantly fewer computational resources, specifically on a single NVIDIA L20 GPU. The study focused on data efficiency, uti…
TOOL · CL_95448 · Jun 16 · 23:08

LLM Tutorial Explains Foundation, Instruct, and Chat Model Differences

A tutorial demonstrates the distinctions between foundation, instruct, and chat models in large language models. It uses the SmolLM2-135M family, runnable on Google Colab without a GPU, to illustrate how models evolve f…
RESEARCH · CL_93551 · Jun 15 · 08:16

Compact Bangla LLM Outperforms Larger Models with Efficient Design

Researchers have developed a new compact language model, bangla-smollm-135m, specifically designed for the Bangla language. This 135-million parameter model achieves competitive performance against larger models by empl…
RESEARCH · CL_79592 · Jun 8 · 16:02

AutoMegaKernel compiles Llama models into single CUDA kernels

Researchers have developed AutoMegaKernel (AMK), a system that compiles HuggingFace Llama-family models into a single, persistent CUDA kernel for efficient forward passes. AMK's static validator ensures schedule safety,…
TOOL · CL_66851 · Jun 2 · 12:56

Jetson Orin Nano benchmarks 8 tiny LLMs across power modes

A benchmark of eight small language models (135M to ~1B parameters) was conducted on a Jetson Orin Nano Super 8GB device. The tests explored four power modes (7W, 15W, 25W, MAXN) using the llama.cpp CUDA backend. The fi…
RESEARCH · CL_06849 · Apr 28 · 04:00

FlashNorm speeds up transformer inference by optimizing normalization layers

Researchers have developed FlashNorm, a technique to accelerate normalization layers in Transformer models. By reformulating RMSNorm and folding its weights into subsequent linear layers, FlashNorm enables parallel exec…

Small language model trained on single GPU detailed in new study

LLM Tutorial Explains Foundation, Instruct, and Chat Model Differences

Compact Bangla LLM Outperforms Larger Models with Efficient Design

AutoMegaKernel compiles Llama models into single CUDA kernels

Jetson Orin Nano benchmarks 8 tiny LLMs across power modes

FlashNorm speeds up transformer inference by optimizing normalization layers