SIGNALAI·May 27, 2026, 4:00 AMSignal75Medium term

More Expressive Feedforward Layers: Part I. Token-Adaptive Mixing of Activations

arXiv:2605.26647v1 Announce Type: new Abstract: Feedforward network (FFN) layers account for a large fraction of parameters and nonlinear expressivity in Transformer-based large language models (LLMs). Despite the evolution from ReLU and GELU to gated variants such as SwiGLU, most FFN designs still use a single fixed activation function, applying the same nonlinear transformation to all tokens. In this work, we propose Mixture of Activations (MoA), a token-adaptive FFN design that mixes a dictionary of activation functions using lightweight input-dependent gates while sharing the same linear p

Why this matters

Why now

The continuous drive for efficiency and performance in large language models necessitates constant innovation in their foundational architectural components, such as feedforward layers, pushing researchers to explore more dynamic and adaptive designs.

Why it’s important

Improving the expressivity and efficiency of FFNs directly translates to more capable and resource-optimized LLMs, which are central to AI development, potentially reducing inference costs and enhancing model performance.

What changes

FFN designs in Transformers may evolve from fixed activation functions to token-adaptive mixing of activations, offering more nuanced and efficient non-linear transformations for different input tokens.

Winners

· AI model developers
· Cloud computing providers (through efficiency gains)
· Hardware manufacturers (optimized for new FFN architectures)

Losers

· Legacy FFN architectures (if MoA becomes dominant)
· Developers relying solely on older, less efficient LLM designs

Second-order effects

Direct

More efficient and powerful large language models become feasible due to improved foundational architecture.

Second

Reduced computational costs for AI inference could democratize access to advanced AI capabilities.

Third

The enhanced performance of LLMs contributes to the acceleration of AI agents' capabilities and broader AI applications.

Editorial confidence: 90 / 100 · Structural impact: 60 / 100

Original report

This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.

Read at arXiv cs.LG

#cs.LG #cs.AI #stat.ML

Tracked by The Continuum Brief · live intelligence network

The Brief · Weekly Dispatch

Stay ahead of the systems reshaping markets.