A benchmark of eight small language models (135M to ~1B parameters) was conducted on a Jetson Orin Nano Super 8GB device. The tests explored four power modes (7W, 15W, 25W, MAXN) using the llama.cpp CUDA backend. The findings indicate that the 25W power mode offers the best balance of performance and efficiency for all tested models, outperforming both the 15W and MAXN modes in terms of tokens generated per joule. AI
IMPACT Identifies optimal power efficiency for running small LLMs on edge devices, guiding hardware and software configurations.
RANK_REASON Benchmark of multiple small LLMs on specific hardware. [lever_c_demoted from research: ic=1 ai=0.7]
- Gemma3-1B
- Jetson Orin Nano Super 8GB
- LFM2.5-1.2B
- LFM2.5-350M
- Llama3.2-1B
- llama.cpp
- NVIDIA
- SmolLM2-135M
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →