SIGNALAI·May 25, 2026, 4:00 AMSignal75Medium term

The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models

arXiv:2605.22870v1 Announce Type: new Abstract: Chain-of-thought (CoT) prompting is necessary for arithmetic in small language models, yet shuffling its steps preserves most performance. What does CoT contribute if not logical sequencing? In three 1-3B instruction-tuned LMs on GSM8K, we isolate the answer-readout stage via prefix completion and identify a positional shortcut: the model copies whichever number occupies the trailing position before the answer delimiter, regardless of intermediate reasoning. Gold-answer presence accounts for 54-92 pp of accuracy (89-92% of each model's teacher-fo

Why this matters

Why now

Ongoing research into interpreting and improving the reliability and reasoning capabilities of small language models (SLMs) continues to uncover fundamental limitations and shortcuts. This specific phenomenon is being identified now as researchers probe deeper into CoT mechanisms in smaller models.

Why it’s important

This research reveals a significant shortcut in how small language models 'reason' during CoT prompting for arithmetic, indicating that perceived reasoning might often be positional number copying rather than logical inference. This limits the true intelligence and reliability of these models in critical tasks.

What changes

Our understanding of the actual reasoning capabilities of small language models used for Chain-of-Thought (CoT) prompting is updated to show a strong reliance on positional shortcuts, reducing confidence in their 'understanding' of logic. We now know that perceived reasoning is often a simple positional copy operation rather than true inference.

Winners

· AI researchers focused on explainability
· Developers of more robust arithmetic models
· Auditors of AI safety and reliability

Losers

· Developers relying solely on CoT for arithmetic in SLMs
· Applications requiring true, verifiable numerical reasoning
· Practitioners overestimating SLM analytical abilities

Second-order effects

Direct

Further research will focus on distinguishing true reasoning from shortcuts in LLMs across various tasks.

Second

New techniques will emerge to force models into genuine logical processing, moving beyond superficial pattern matching.

Third

This could lead to a re-evaluation of 'intelligence' metrics for AI, emphasizing verifiable logical steps over apparent correct answers.

Editorial confidence: 90 / 100 · Structural impact: 40 / 100

Original report

This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.

Read at arXiv cs.LG

#cs.LG #cs.AI #cs.CL

Tracked by The Continuum Brief · live intelligence network

The Brief · Weekly Dispatch

Stay ahead of the systems reshaping markets.