Cheap AI model beats GPT-4o and Gemini in email triage test

By PulseAugur Editorial · [1 sources] · 2026-06-24 16:59

A developer built an email firewall using AI models to categorize incoming messages into four tiers: SILENT, QUEUE, PUSH, and AUTO. Contrary to expectations, a less expensive model named Flash outperformed both GPT-4o and Gemini 2.5 Pro in a small-scale evaluation, achieving a higher quality score. The developer attributes this success to the task's nature, which requires consistent signal scoring rather than deep reasoning, making a faster, cheaper model more suitable and less prone to overthinking simple decisions. AI

IMPACT Suggests that for specific, repetitive tasks, cheaper AI models can outperform more expensive, advanced ones, challenging assumptions about model capability needs.

RANK_REASON Developer's personal evaluation of AI models for a specific task.

Read on dev.to — LLM tag →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Cheap AI model beats GPT-4o and Gemini in email triage test

COVERAGE [1]

dev.to — LLM tag TIER_1 English(EN) · yongrean · 2026-06-24 16:59

I let GPT-4o and a cheaper model fight over my inbox. GPT-4o lost.

<p>Here's the scoreboard. Same 50 emails, same prompt, same 4-tier task:</p> <div class="table-wrapper-paragraph"><table> <thead> <tr> <th>Model</th> <th>Accuracy</th> <th>Note</th> </tr> </thead> <tbody> <tr> <td><code>google/gemini-2.5-flash</code></td> <td><strong>88%</strong>…

COVERAGE [1]

I let GPT-4o and a cheaper model fight over my inbox. GPT-4o lost.

RELATED ENTITIES

RELATED TOPICS