Macro: Enhancing Multilingual Counterfactual Explanations through Alignment-as-Preference Optimization

arXiv:2605.11632v2 Announce Type: replace Abstract: Self-generated counterfactual explanations (SCEs) are minimally modified inputs (minimality) generated by large language models (LLMs) that flip their own predictions (validity), offering a causally grounded approach to unraveling black-box LLM behavior. Yet extending them beyond English remains challenging: existing methods struggle to produce valid SCEs in non-dominant languages, and a persistent trade-off between validity and minimality undermines explanation quality. We introduce Macro, a preference alignment framework that applies Direct
The increasing deployment of large language models globally necessitates robust and understandable explanations for their behavior across diverse linguistic contexts, especially as LLMs become more integrated into critical applications.
Enhancing multilingual counterfactual explanations directly addresses a major challenge in LLM transparency and trustworthiness, which is crucial for their adoption and regulatory acceptance in non-English speaking markets.
The ability to generate high-quality, valid, and minimal counterfactual explanations in multiple languages improves the auditability and explainability of LLMs, potentially accelerating their global deployment and integration.
- · AI developers (especially non-English)
- · Regulatory bodies
- · LLM auditing firms
- · Users of AI in diverse linguistic contexts
- · Developers of proprietary, black-box LLMs without strong interpretability
- · Monolingual AI explanation tools
Improved trust and adoption of LLMs in non-English speaking regions due to better explainability.
Increased demand for multilingual LLMs and explanation frameworks, fostering more equitable global AI development.
Potential for regulatory frameworks to mandate multilingual explainability, impacting market entry and compliance for AI products.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL