arXiv:2601.14063v2 Announce Type: replace-cross Abstract: Cross-cultural competence in large language models (LLMs) requires understanding and adapting Culture-Specific Items (CSIs) across varying cultural contexts. However, progress in evaluating this capability remains limited by the lack of high-quality CSI-annotated corpora with parallel cross-cultural sentence pairs. We introduce XCR-Bench, a Cross(X)-Cultural Reasoning Benchmark containing 4.1k parallel sentences and 1,098 CSIs across three reasoning tasks. XCR-Bench integrates Newmark's CSI framework with Hall's Triad of Culture, enabli
Source: arXiv cs.AI — read the full report at the original publisher.
