arXiv:2606.00739v1 Announce Type: new Abstract: Large language models hallucinate even when the answer lies within their parameters. While inference-time scaling can surface this latent knowledge, the most effective methods require supervision: a trained verifier or reward model. We ask what can be done with only a base language model: which intrinsic signal best identifies correct outputs, and how should it be decoded? We cast this as a score~$\times$~decoder grid pairing four scores (perplexity, contrastive, power-distribution likelihood, and self-verification) with three decoding families (

Source: arXiv cs.LG — read the full report at the original publisher.

This is a curated wire item. The Continuum Brief does not republish full third-party articles; this entry links to the original source.