TUNABLE SQUASHING ACTIVATION FUNCTION FOR DEEP NEURAL NETWORKS

Authors

  • A. Yu. Shafronenko Kharkiv National University of Radio Electronics, Kharkiv, Ukraine
  • Ye. V. Bodyanskiy Kharkiv National University of Radio Electronics, Kharkiv, Ukraine
  • Ye. O. Shafronenko Kharkiv National University of Radio Electronics, Kharkiv, Ukraine
  • F. A. Brodetskyi Kharkiv National University of Radio Electronics, Kharkiv, Ukraine
  • О. S. Tanianskyi Kharkiv National University of Radio Electronics, Kharkiv, Ukraine

DOI:

https://doi.org/10.15588/1607-3274-2026-1-5

Keywords:

squashing activation function, deep neural network, ReLU, gradient procedures, training signal

Abstract

Context. At present, artificial neural networks have become widely used to solve many problems of information processing of the most diverse nature and, above all, data mining, due to their universal approximating capabilities and the ability to learn their parameters – synaptic weights. The process of training a multilayer network consists of adjusting the synaptic weights of each neuron using the error backpropagation procedure, which is based on the chain rule of differentiation for complex functions and gradientbased optimization. Deep neural networks are based on multilayer perceptrons, which have proven their effectiveness in solving many very complex problems related to the processing and synthesis of images of various natures, natural language texts, multidimensional stochastic and chaotic sequences, including audio and video signals. Unlike classical three-layer perceptrons, DNNs contain dozens and hundreds of layers, and the number of their synaptic weights is commensurate with or even exceeds the number of synapses in the biological brain. It is clear that for these neural networks, the effect of a vanishing gradient is extremely undesirable; therefore, instead of traditional compression functions, piecewise-linear constructions are usually used here, the most popular of which is the so-called ReLU.
Objective. The purpose of the work is to introduce an adaptive activation function for deep neural networks based on the most common piecewise linear function, ReLU.
Method. A new tunable activation function for deep neural networks is proposed based on the most common piecewise linear function, ReLU, which, however, does not satisfy the conditions of G. Cybenko’s approximation theorem, but provides protection for the learning process against the undesirable effect of vanishing gradients.
Results. A new adaptive piecewise linear function based on ReLU is introduced, which is both compressive and protected against vanishing gradients. In this case, during the training process, not only are the synaptic weights adjusted in the network, but also the parameters of the activation function itself. Using the proposed function allows you to reduce the number of neurons and hidden layers in the neural network, the number of required training samples, and the time required to set up the network.
Conclusions. An adaptive squashing activation function based on the widely used ReLU for deep neural networks is introduced, providing both universal approximating properties and preventing vanishing gradients. A training procedure using this function is proposed, offering high performance and a simple numerical implementation. An additional circuit for tuning the parameters of the activation functions can be quite simply introduced into existing deep neural networks that use piecewise linear activation functions.

Author Biographies

A. Yu. Shafronenko, Kharkiv National University of Radio Electronics, Kharkiv

Dr. Sc., Associate Professor at the Department of Informatics

Ye. V. Bodyanskiy, Kharkiv National University of Radio Electronics, Kharkiv

Dr. Sc., Professor at the Department of Artificial Intelligence

Ye. O. Shafronenko, Kharkiv National University of Radio Electronics, Kharkiv

Senior Lecturer at the Department of Media Engineering and Information Radio Electronic
Systems

F. A. Brodetskyi, Kharkiv National University of Radio Electronics, Kharkiv

Senior Lecturer at the Department of Informatics

О. S. Tanianskyi, Kharkiv National University of Radio Electronics, Kharkiv

Post-graduate student at the Department of Informatics

References

Cybenko G. Approximation by superpositions of a sigmoidal function, Mathematics of control, signals and systems, 1989, 2.4, pp. 303–314.

Hornik K., Stinchcombe M., White H. Multilayer feedforward networks are universal approximators, Neural Networks, 1989, 2.5, pp. 359–366.

Hornik K. Approximation capabilities of multilayer feedforward networks, Neural Networks, 1991, 4.2, pp. 251–257.

Poggio T., Girosi F. Networks for approximation and learning, Proceedings of the IEEE, 1990, Vol. 78, № 9, pp. 1481–1497.

Haykin S. Neural networks: а comprehensive foundation. Prentice Hall PTR, 2004. Vol. 2. 1994.

Vapnik V. N. The Nature of Statistical Learning Theory. New York, Springer, 1995.

Cortes C. and Vapnik V. Support-vector networks, Machine Learning, Sep. 1995, Vol. 20, No. 3, pp. 273–297, https://doi.org/10.1007/bf00994018.

Bodyanskiy Ye., Zaychenko Yu. and Hamidov G. Hybrid Deep Learning Networks Based on Self-Organization and their Applications. Cambridge Scholars Publishing, 2024.

Kaczmarz S. Approximate solution of systems of linear equations, International Journal of Control, 1993, No. 57 (6), pp. 1269–1271.

Widrow B. and Hoff M. E. Adaptive switching circuits, 1960 IRE WESCON Convention Record, 1960, pp. 96–104.

Bodyanskiy Ye., Kolodyazhniy V. and Stephan A. An adaptive learning algorithm for a neuro-fuzzy network, International Conference on Computational Intelligence. Berlin, Heidelberg, Springer Berlin Heidelberg, 2001, pp. 68–75.

Downloads

Published

2026-03-27

How to Cite

Shafronenko, A. Y. ., Bodyanskiy, Y. V. ., Shafronenko, Y. O., Brodetskyi, F. A., & Tanianskyi О. S. (2026). TUNABLE SQUASHING ACTIVATION FUNCTION FOR DEEP NEURAL NETWORKS. Radio Electronics, Computer Science, Control, (1), 49–54. https://doi.org/10.15588/1607-3274-2026-1-5

Issue

Section

Neuroinformatics and intelligent systems