Speaking in Self-Assessing Tongues: On the Verbalized Confidence of LLMs in Machine Translation

arXiv:2606.17234v1 Announce Type: new Abstract: The rapid rise in popularity of large language models (LLMs) for translation calls for a thorough study of the reliability of their confidence in their own outputs. Unlike many generation tasks, translation errors and confidence levels can be useful at different levels of granularity (tokens, words, or spans). Unsupervised approaches based on internal signals like predicted probabilities can be misleading because they reflect certainty among alternatives rather than correctness. In addition, they require access to such internal signals. Here, we
The rapid deployment and increasing reliance on large language models for machine translation necessitates a deeper understanding of their reliability at this time.
The ability of LLMs to self-assess their confidence accurately is critical for their safe and effective integration into sensitive applications and workflows, impacting trust and adoption.
This research introduces methodologies to evaluate LLM confidence beyond internal probabilities, which could lead to more robust and transparent AI translation systems.
- · AI developers
- · Translation services
- · Industries relying on machine translation
- · Providers of unreliable LLM translation
- · Users relying solely on probabilistic confidence scores
Improved reliability and trust in LLM-powered machine translation.
Increased adoption of LLMs in critical translation tasks previously reserved for human translators.
Potential for new human-AI interfaces designed to leverage verbalized confidence for efficient post-editing or quality assurance.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL