SIGNALAI·Jun 19, 2026, 4:00 AMSignal55Short term

How Linear Is a Transformer Feed-Forward Block? Per-Block Linear Recoverability Is Learned, Not Architectural

arXiv:2606.19379v1 Announce Type: cross Abstract: Transformer feed-forward networks (FFNs) are often treated as nonlinear stores of computation, yet how nonlinear a trained FFN block actually is has rarely been measured. We treat each FFN as a position-wise input-to-output map and split it into the exact least-squares linear approximation plus a residual. The held-out variance the closed-form linear map explains defines a block's linear recoverability (R^2_lin), an optimiser-free measure of its linearity. Across all twelve blocks of GPT-2, Pythia-160m, and llama-160m, R^2_lin is highly heterog

Why this matters

Why now

This paper investigates a fundamental property of transformer architectures, addressing a gap in understanding how non-linearity is learned within these models as they become more prevalent and complex.

Why it’s important

Understanding the linear recoverability of FFN blocks can lead to more efficient transformer designs, better interpretability, and potentially reduce computational overhead for critical AI applications.

What changes

This research provides a quantifiable metric (R^2_lin) to measure the linearity of FFN blocks, shifting the focus from general non-linearity assumptions to empirically observable characteristics during training.

Winners

· AI researchers
· ML model developers
· Organizations deploying large language models

Losers

· Inefficient transformer architectures

Second-order effects

Direct

The linearity measure helps in identifying which parts of a transformer model are truly performing non-linear computations.

Second

This understanding could inform the design of more compact and specialized neural network layers for specific tasks, potentially reducing the training and inference costs of large models.

Third

Improved efficiency and interpretability of transformer models could accelerate the development and deployment of advanced AI agents, impacting various industries reliant on complex AI.

Editorial confidence: 85 / 100 · Structural impact: 40 / 100

Original report

This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.

Read at arXiv cs.CL

#cs.LG #cs.AI #cs.CL

Tracked by The Continuum Brief · live intelligence network

The Brief · Weekly Dispatch

Stay ahead of the systems reshaping markets.