arXiv:2605.26827v1 Announce Type: new Abstract: Recent benchmarks reveal that despite strong reasoning capabilities, large language models (LLMs) still struggle to faithfully apply complex contextual knowledge. These failures are often not wholesale reasoning collapses: in context-rich tasks, models may follow the central reasoning path while missing peripheral, persistent, or format-sensitive requirements.
Source: arXiv cs.CL — read the full report at the original publisher.
