Unsloth Studeo achieves 111.3 tokens/sec with Gemma4:e4B model

By PulseAugur Editorial · [1 sources] · 2026-06-23 01:59

Unsloth Studeo, running on a laptop, achieved a message timing of 111.3 tokens per second with the Gemma4:e4B model. This performance metric, measured in tokens per second, was described as "crazy" by the user, who noted that the web application's lack of automatic response speaking was a drawback but expressed optimism about potentially coding a solution. AI

IMPACT Demonstrates specific performance benchmarks for local AI model execution.

RANK_REASON User reports performance metrics for a specific model and software combination.

Read on Mastodon — fosstodon.org →

model release

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Unsloth Studeo achieves 111.3 tokens/sec with Gemma4:e4B model

COVERAGE [1]

Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] · 2026-06-23 01:59

So, Unsloth Studeo, same laptop as Ollama, Gemma4:e4B: (technically gemma-4-E4B-it-qat-GGUF · UD-Q4_K_XL) Message timing 111.3 tok/s. Tokens per second. This is

So, Unsloth Studeo, same laptop as Ollama, Gemma4:e4B: (technically gemma-4-E4B-it-qat-GGUF · UD-Q4_K_XL) Message timing 111.3 tok/s. Tokens per second. This is just plain crazy. I mean, the web app doesn't automatically speak responses, which sucks, but like I may be able to vib…

COVERAGE [1]

So, Unsloth Studeo, same laptop as Ollama, Gemma4:e4B: (technically gemma-4-E4B-it-qat-GGUF · UD-Q4_K_XL) Message timing 111.3 tok/s. Tokens per second. This is

RELATED ENTITIES

RELATED TOPICS