Papers
arxiv:2601.07767

Are LLM Decisions Faithful to Verbal Confidence?

Published on Jan 12
· Submitted by
Deqing Fu
on Jan 13
Authors:
,
,
,

Abstract

Large language models exhibit a disconnect between their expressed uncertainty and strategic decision-making under varying penalty conditions, failing to adjust abstention policies even when optimal.

AI-generated summary

Large Language Models (LLMs) can produce surprisingly sophisticated estimates of their own uncertainty. However, it remains unclear to what extent this expressed confidence is tied to the reasoning, knowledge, or decision making of the model. To test this, we introduce RiskEval: a framework designed to evaluate whether models adjust their abstention policies in response to varying error penalties. Our evaluation of several frontier models reveals a critical dissociation: models are neither cost-aware when articulating their verbal confidence, nor strategically responsive when deciding whether to engage or abstain under high-penalty conditions. Even when extreme penalties render frequent abstention the mathematically optimal strategy, models almost never abstain, resulting in utility collapse. This indicates that calibrated verbal confidence scores may not be sufficient to create trustworthy and interpretable AI systems, as current models lack the strategic agency to convert uncertainty signals into optimal and risk-sensitive decisions.

Community

Paper author Paper submitter

While LLMs can express their confidence levels, their actual decisions do not demonstrate risk sensitivity. Even with high error penalties, they rarely abstain from making choices, often leading to utility collapse.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.07767 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.07767 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.07767 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.