
arXiv:2605.23970v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used as automatic judges for summarization and dialogue evaluation. Prior work has documented biases such as position, verbosity, and style preferences, but largely focuses on outcomes, leaving judge explanations underexplored. We instead ask whether LLM judges are cue-invariant, i.e., whether their rankings and explanations remain stable when non-evidential cues are perturbed while holding the underlying texts fixed. We introduce a suite of cue interventions (Blind, Truth, Flip, Placebo, Reveal-After
The increasing deployment of LLMs as judges necessitates a deeper understanding of their inherent biases and the reliability of their evaluations, moving beyond outcome analysis to causal explanations.
Understanding and mitigating rationalization bias in LLM judges is crucial for developing trustworthy and equitable autonomous systems, directly impacting the integrity of automated decision-making in various applications.
This research shifts the focus from simply observing LLM biases to causally investigating their origins, enabling more targeted interventions and improvements in LLM judge design.
- · AI developers
- · Auditors of AI systems
- · Companies seeking explainable AI
- · Developers neglecting bias mitigation
- · Systems relying on un-scrutinized LLM judgments
Improved reliability and fairness of LLM-based evaluation and decision-making systems.
Increased adoption of LLM judges in more sensitive domains due to enhanced trustworthiness.
The development of a new field of 'AI Judge forensics' focused on deconstructing and predicting LLM rationales.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL