COFT: Counterfactual-Conformal Decoding for Fair Chain-of-Thought Reasoning in Large Language Models

arXiv:2605.30641v1 Announce Type: cross Abstract: Large language models (LLMs) can reveal and amplify societal biases during chain-of-thought (CoT) generation. We present COFT (Chain of Fair Thought), a training-free decoding method that applies token-level fairness control at decode time, with distribution-free marginal validity guarantees (under exchangeability) for any frozen causal language model. COFT operates in three stages. First, it creates a masked counterfactual prompt by replacing sensitive spans with neutral tokens. Second, it compares the factual and masked logit distributions th
As LLMs become more integrated into critical applications, the urgency to address their inherent biases, particularly in chain-of-thought reasoning, has increased significantly.
Bias amplification in LLMs undermines trust, fairness, and the reliability of AI systems, making methods for bias mitigation directly relevant to ethical AI deployment and regulatory compliance.
The introduction of training-free, provably valid methods like COFT suggests a pathway to mitigate LLM biases without extensive model retraining, potentially democratizing fair AI development.
- · AI developers
- · Ethical AI researchers
- · Regulatory bodies
- · Users of LLM applications
- · Generative AI models with inherent biases if unaddressed
- · Developers solely relying on traditional fine-tuning for fairness
This method immediately offers a tool for developers to deploy more equitable LLM systems.
Broader adoption of such techniques could accelerate public and regulatory acceptance of AI in sensitive domains.
The pursuit of provable fairness guarantees might lead to a fundamental re-evaluation of LLM architectures and their training paradigms.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI