
arXiv:2602.12390v2 Announce Type: replace Abstract: We study neural networks with trainable low-degree rational activation functions and show that they are more expressive and parameter-efficient than modern piecewise-linear and smooth activations such as ELU, LeakyReLU, LogSigmoid, PReLU, ReLU, SELU, CELU, Sigmoid, SiLU, Mish, Softplus, Tanh, Softmin, Softmax, and LogSoftmax. For an error target of $\varepsilon>0$, we establish approximation-theoretic separations: Any network built from standard fixed activations can be uniformly approximated on compact domains by a rational-activation networ
This research highlights a fundamental advance in neural network architecture, demonstrating a clear expressivity advantage for rational activation functions over current industry standards. It addresses a core limitation in traditional activation functions, driven by ongoing efforts to improve AI model efficiency and performance.
A strategic reader should care because this discovery could lead to significantly more parameter-efficient and expressive AI models, impacting the fundamental architecture and computational demands of future AI systems. This could reduce the compute required for achieving high performance, thereby democratizing access and accelerating innovation.
The dominance of piecewise-linear and smooth activation functions in neural networks may be challenged, leading to a shift towards rational activation functions for improved model performance and efficiency. This could redefine best practices in neural network design.
- · AI researchers
- · Deep learning practitioners
- · Hardware manufacturers (potentially optimizing for new architectures)
- · Cloud computing providers (reduced computation costs for customers)
- · Developers heavily invested in older activation function paradigms
- · Companies whose competitive advantage relies solely on massive compute (if effic
Neural networks with rational activation functions will achieve higher performance with fewer parameters.
This efficiency gain could lower the computational barriers to entry for advanced AI development, potentially diversifying the field.
Reduced compute requirements in model training and inference could ease pressure on energy grids and semiconductor supply chains, impacting the broader compute infrastructure.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG