PulseAugur
EN
LIVE 18:48:01

Gemma 3 4B LLM confidence training shows mixed results, improves accuracy post-hoc

A study on the Gemma 3 4B model investigated methods to improve its verbal confidence in responses. Initial attempts using a filtered dataset for confidence-conditioned supervised fine-tuning (CSFT) yielded negative results, decreasing performance. However, an exploratory approach that removed the filter and trained on all calibration items significantly improved the model's ability to predict verbal correctness, achieving an AUROC2 of 0.774 on TriviaQA. AI

IMPACT Demonstrates a potential method to improve confidence calibration in smaller LLMs, impacting their reliability in downstream applications.

RANK_REASON This is a research paper detailing experimental results on a specific model's performance.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

Gemma 3 4B LLM confidence training shows mixed results, improves accuracy post-hoc

COVERAGE [2]

  1. arXiv cs.CL TIER_1 English(EN) · Jon-Paul Cacioli ·

    Distilling Self-Consistency into Verbal Confidence: A Pre-Registered Negative Result and Post-Hoc Rescue on Gemma 3 4B

    arXiv:2604.24070v1 Announce Type: new Abstract: Small instruct-tuned LLMs produce degenerate verbal confidence under minimal elicitation: ceiling rates above 95%, near-chance Type-2 AUROC, and Invalid validity profiles. We test whether confidence-conditioned supervised fine-tunin…

  2. arXiv cs.CL TIER_1 English(EN) · Jon-Paul Cacioli ·

    Distilling Self-Consistency into Verbal Confidence: A Pre-Registered Negative Result and Post-Hoc Rescue on Gemma 3 4B

    Small instruct-tuned LLMs produce degenerate verbal confidence under minimal elicitation: ceiling rates above 95%, near-chance Type-2 AUROC, and Invalid validity profiles. We test whether confidence-conditioned supervised fine-tuning (CSFT) with self-consistency-derived targets c…