The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models

arXiv:2605.22870v1 Announce Type: new Abstract: Chain-of-thought (CoT) prompting is necessary for arithmetic in small language models, yet shuffling its steps preserves most performance. What does CoT contribute if not logical sequencing? In three 1-3B instruction-tuned LMs on GSM8K, we isolate the answer-readout stage via prefix completion and identify a positional shortcut: the model copies whichever number occupies the trailing position before the answer delimiter, regardless of intermediate reasoning. Gold-answer presence accounts for 54-92 pp of accuracy (89-92% of each model's teacher-fo
Ongoing research into interpreting and improving the reliability and reasoning capabilities of small language models (SLMs) continues to uncover fundamental limitations and shortcuts. This specific phenomenon is being identified now as researchers probe deeper into CoT mechanisms in smaller models.
This research reveals a significant shortcut in how small language models 'reason' during CoT prompting for arithmetic, indicating that perceived reasoning might often be positional number copying rather than logical inference. This limits the true intelligence and reliability of these models in critical tasks.
Our understanding of the actual reasoning capabilities of small language models used for Chain-of-Thought (CoT) prompting is updated to show a strong reliance on positional shortcuts, reducing confidence in their 'understanding' of logic. We now know that perceived reasoning is often a simple positional copy operation rather than true inference.
- · AI researchers focused on explainability
- · Developers of more robust arithmetic models
- · Auditors of AI safety and reliability
- · Developers relying solely on CoT for arithmetic in SLMs
- · Applications requiring true, verifiable numerical reasoning
- · Practitioners overestimating SLM analytical abilities
Further research will focus on distinguishing true reasoning from shortcuts in LLMs across various tasks.
New techniques will emerge to force models into genuine logical processing, moving beyond superficial pattern matching.
This could lead to a re-evaluation of 'intelligence' metrics for AI, emphasizing verifiable logical steps over apparent correct answers.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG