SIGNALAI·Jun 16, 2026, 4:00 AMSignal75Short term

Rescaling Confidence: What Scale Design Reveals About LLM Metacognition

Source: arXiv cs.AI

Share
Rescaling Confidence: What Scale Design Reveals About LLM Metacognition

arXiv:2603.09309v2 Announce Type: replace Abstract: Verbalized confidence, in which LLMs report a numerical certainty score, is widely used to estimate uncertainty in black-box settings, yet the confidence scale itself (typically 0--100) is rarely examined. We show that this design choice is not neutral. Across six LLMs and three datasets, verbalized confidence is heavily discretized, with more than 78\% of responses concentrating on just three round-number values. To investigate this phenomenon, we systematically manipulate confidence scales along three dimensions: granularity, boundary place

Why this matters
Why now

This research provides timely insights into the fundamental workings and limitations of LLM metacognition, appearing as confidence in AI systems is increasingly critical for their real-world application.

Why it’s important

Understanding how LLMs express confidence directly impacts system reliability, trust, and our ability to interpret and utilize their outputs in sensitive applications.

What changes

The explicit recognition of discretization and scale dependence in LLM confidence reports means that raw confidence scores can no longer be taken at face value without careful consideration of their elicitation method.

Winners
  • · AI researchers focusing on interpretability
  • · Developers of robust AI systems
  • · Industries requiring high-assurance AI
Losers
  • · Systems relying on naive interpretation of LLM confidence
  • · LLMs with poorly designed confidence mechanisms
Second-order effects
Direct

This work will lead to improved methods for eliciting and calibrating confidence in LLMs.

Second

Better confidence calibration will enable more reliable AI agents and decision support systems.

Third

Increased reliability and trust could accelerate the integration of AI into critical infrastructure and white-collar workflows.

Editorial confidence: 90 / 100 · Structural impact: 55 / 100
Original report

This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.

Read at arXiv cs.AI
Tracked by The Continuum Brief · live intelligence network
Share
The Brief · Weekly Dispatch

Stay ahead of the systems reshaping markets.

By subscribing, you agree to receive updates from THE CONTINUUM BRIEF. You can unsubscribe at any time.