A new research paper published on arXiv explores the impact of confidence scale design on Large Language Models (LLMs). The study found that LLMs tend to concentrate their reported confidence scores on round numbers, regardless of the scale's range or regularity. Researchers manipulated confidence scales across different granularities and boundary placements, discovering that a 0-20 scale consistently improved metacognitive efficiency compared to the standard 0-100 scale. The findings suggest that confidence scale design is a critical factor in evaluating LLM uncertainty and should be treated as a primary experimental variable. AI
IMPACT Suggests that LLM evaluation methods need refinement by considering confidence scale design as a critical factor.
RANK_REASON Research paper published on arXiv detailing findings about LLM metacognition and confidence scales. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →