TUNABLE SQUASHING ACTIVATION FUNCTION FOR DEEP NEURAL NETWORKS
DOI:
https://doi.org/10.15588/1607-3274-2026-1-5Keywords:
squashing activation function, deep neural network, ReLU, gradient procedures, training signalAbstract
Context. At present, artificial neural networks have become widely used to solve many problems of information processing of the most diverse nature and, above all, data mining, due to their universal approximating capabilities and the ability to learn their parameters – synaptic weights. The process of training a multilayer network consists of adjusting the synaptic weights of each neuron using the error backpropagation procedure, which is based on the chain rule of differentiation for complex functions and gradientbased optimization. Deep neural networks are based on multilayer perceptrons, which have proven their effectiveness in solving many very complex problems related to the processing and synthesis of images of various natures, natural language texts, multidimensional stochastic and chaotic sequences, including audio and video signals. Unlike classical three-layer perceptrons, DNNs contain dozens and hundreds of layers, and the number of their synaptic weights is commensurate with or even exceeds the number of synapses in the biological brain. It is clear that for these neural networks, the effect of a vanishing gradient is extremely undesirable; therefore, instead of traditional compression functions, piecewise-linear constructions are usually used here, the most popular of which is the so-called ReLU.
Objective. The purpose of the work is to introduce an adaptive activation function for deep neural networks based on the most common piecewise linear function, ReLU.
Method. A new tunable activation function for deep neural networks is proposed based on the most common piecewise linear function, ReLU, which, however, does not satisfy the conditions of G. Cybenko’s approximation theorem, but provides protection for the learning process against the undesirable effect of vanishing gradients.
Results. A new adaptive piecewise linear function based on ReLU is introduced, which is both compressive and protected against vanishing gradients. In this case, during the training process, not only are the synaptic weights adjusted in the network, but also the parameters of the activation function itself. Using the proposed function allows you to reduce the number of neurons and hidden layers in the neural network, the number of required training samples, and the time required to set up the network.
Conclusions. An adaptive squashing activation function based on the widely used ReLU for deep neural networks is introduced, providing both universal approximating properties and preventing vanishing gradients. A training procedure using this function is proposed, offering high performance and a simple numerical implementation. An additional circuit for tuning the parameters of the activation functions can be quite simply introduced into existing deep neural networks that use piecewise linear activation functions.
References
Cybenko G. Approximation by superpositions of a sigmoidal function, Mathematics of control, signals and systems, 1989, 2.4, pp. 303–314.
Hornik K., Stinchcombe M., White H. Multilayer feedforward networks are universal approximators, Neural Networks, 1989, 2.5, pp. 359–366.
Hornik K. Approximation capabilities of multilayer feedforward networks, Neural Networks, 1991, 4.2, pp. 251–257.
Poggio T., Girosi F. Networks for approximation and learning, Proceedings of the IEEE, 1990, Vol. 78, № 9, pp. 1481–1497.
Haykin S. Neural networks: а comprehensive foundation. Prentice Hall PTR, 2004. Vol. 2. 1994.
Vapnik V. N. The Nature of Statistical Learning Theory. New York, Springer, 1995.
Cortes C. and Vapnik V. Support-vector networks, Machine Learning, Sep. 1995, Vol. 20, No. 3, pp. 273–297, https://doi.org/10.1007/bf00994018.
Bodyanskiy Ye., Zaychenko Yu. and Hamidov G. Hybrid Deep Learning Networks Based on Self-Organization and their Applications. Cambridge Scholars Publishing, 2024.
Kaczmarz S. Approximate solution of systems of linear equations, International Journal of Control, 1993, No. 57 (6), pp. 1269–1271.
Widrow B. and Hoff M. E. Adaptive switching circuits, 1960 IRE WESCON Convention Record, 1960, pp. 96–104.
Bodyanskiy Ye., Kolodyazhniy V. and Stephan A. An adaptive learning algorithm for a neuro-fuzzy network, International Conference on Computational Intelligence. Berlin, Heidelberg, Springer Berlin Heidelberg, 2001, pp. 68–75.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 A. Yu. Shafronenko, Ye. V. Bodyanskiy, Ye. O. Shafronenko, F. A. Brodetskyi, О. S. Tanianskyi

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Creative Commons Licensing Notifications in the Copyright Notices
The journal allows the authors to hold the copyright without restrictions and to retain publishing rights without restrictions.
The journal allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles.
The journal allows to reuse and remixing of its content, in accordance with a Creative Commons license СС BY -SA.
Authors who publish with this journal agree to the following terms:
-
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License CC BY-SA that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
-
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
-
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) as it can lead to productive exchanges, as well as earlier and greater citation of published work.