ANALYSIS OF PROCEDURES FOR VOICE SIGNAL NORMALIZATION AND SEGMENTATION IN INFORMATION SYSTEMS

Authors

  • M. S. Pastushenko Kharkiv National University of Radio Electronics, Kharkiv, Ukraine
  • О. М. Pastushenko Armed Forces of Ukraine, Ukraine
  • T. А. Faizulaiev “TAF-87” LLC, Kharkiv, Ukraine

DOI:

https://doi.org/10.15588/1607-3274-2025-4-17

Keywords:

authentication, voice signal, normalization, segmentation, formant data

Abstract

Context. The current task of evaluating formant data (formant frequencies, their spectral density level, amplitude-frequency spectrum envelope, formant frequency spectrum width) in voice authentication systems is considered. The object of the study is the process of digital preprocessing of the voice signal when extracting formant data.
Objective. Evaluation of the effectiveness of traditional procedures for digital preprocessing of a user voice signal and development of proposals for improving the quality of formant data extraction.
Method. A mathematical model for extracting formant data from an experimental voice signal has been developed to study the influence of normalization and segmentation procedures on the quality of the resulting estimates. By modeling the process of extracting formant data, the results of digital processing of normalized and non-normalized voice signals are compared. The influence of the processed frame duration of the experimental voice signal on the quality of the formant frequencies assessment is estimated. The results are obtained for the experimental phoneme and morpheme.
Results. The obtained results show that when processing a voice signal with a sufficient signal-to-noise ratio, normalization procedures are not mandatory when extracting formant data. Moreover, normalization leads to a less accurate measurement of the spectrum width of formant frequencies. It is also unacceptable to use a processed frame duration of less than 40 ms. These results allow us to modify the traditional method of voice signal preprocessing. The use of the modeling method in the study of the experimental voice signal confirms the reliability of the results obtained.
Conclusions. The scientific novelty of the research results lies in the modification of the voice signal preprocessing methodology in authentication systems. Eliminating normalization procedures at high signal-to-noise ratios of the voice signal, which occurs in user authentication systems, makes it possible to increase the speed of formant data extraction and more accurately estimate the width of the formant frequency spectrum. Selecting a frame duration of at least 40 ms for the processed signal significantly improves the accuracy of formant frequency determination. Otherwise, the estimates of the formant frequencies will be high. Moreover, when processing phonemes, the processed voice signal cannot be divided into frames. Practical application of research results allows to increase the efficiency and accuracy of the formant data generation. Prospects for further research may be studies of the influence of normalization and framing procedures on other elements of a template of the authentication system user.

Author Biographies

M. S. Pastushenko, Kharkiv National University of Radio Electronics, Kharkiv

PhD, Professor, Professor of V. V. Popovskyy Department of Infocommunication Engineering

О. М. Pastushenko, Armed Forces of Ukraine

PhD, Senior Researcher, Serviceman

T. А. Faizulaiev, “TAF-87” LLC, Kharkiv

Director

References

Pastushenko M. O., Pastushenko M. S., Petrachenko M. O. Do pytannja ocinky efektyvnosti biometrychnyh system, Problemy telekomunikacij, 2023, № 1(32), pp. 37–44. DOI: https://doi.org/10.30837/pt.2023.1.03

Rabiner L. R., Schafer R. W. Digital Processing of Speech Signals. NJ, Prentice-Hall, Inc., 1978, 512 p. URL: https://ie.uryukyu.ac.jp/~asharif/pukiwiki/attach/Acoustic%20Speech%20 Signal%20Processing_Prentice%20Hall%20 %20Digital%20Processing%20of%20Speech%20Signals.pdf

Beigi H. Fundamentals of Speaker Recognition. NY, Springer, 2011, 942 p. DOI:10.1007/978-0-387-77592-0

Persson A., Barreda S., Jaeger T. F. Comparing normalization against US English listeners’vomel perception, The Journal of the Acoustical Society of America, 2025, Vol. 157, № 2, pp. 1458–1482. DOI: //doi.org/10.1121/10.0035476

Clopper C. G., Dossey E., Gonzalez R. Raw acoustic vs. normalized phonetic convergence: Imitation of the Northern Cities Shift in the American Midwest, Laboratory Phonology, 2024, Vol. 15(1), pp. 1–15. DOI: https://doi.org/10.16995/labphon.10893

Anikin A., Barreda S., Reby D. A Practical guide to calculating vocal tract length and scale-invariant formant patterns, Springer Nature Link, 2023, Vol. 56, pp. 5588–5604. DOI: 10.3758/s13428-023-02288-x

Almaadeed N. Aggoun, A., Amira A. Text-Independent Speaker Identification Using Vowel Formants, Journal of Signal Processing Systems, 2015, Vol. 82, № 3, pp. 345 – 356. DOI: https://doi.org/10.1007/s11265-015-1005-5

Aggarwal S., Vasukidevi G., Selvakanmani S., Pant B., Kaur K., Verma A., Binegde G. N. Audio Segmentation Techniques and Applications Based on Deep Learning, Journal of Scientific Programming. Wiley Online Library, 2022, Vol. 2022, pp. 1–9. DOI: https://doi.org/10.1155/2022/7994191

Lebourdais M., Mariotte T., Almudévar A., Tahon M., Ortega A. Explainable by-design Audio Segmentation through Non-Negative Matrix Factorization and Probing, arXiv:2406.13385 [eess.AS], 2024, pp. 1–5. DOI: https://doi.org/10.48550/arXiv.2406.13385

Lee J., Kim S., Kim H., Chung J. S. Lightweight Audio Segmentation for Long-form Speech Translation, arXiv:2406.10549 [eess.AS], 2024, pp. 1–5. DOI: https://doi.org/10.48550/arXiv.2406.10549

Pastushenko M., Krasnozheniuk Ya., Zaika M. Investigation of Informativeness and Stability of Mel-Frequency Cepstral Coefficients Estimates based on Voice Signal Phase Data of Authentication System User, International Conference Problems of Infocommunications. Science and Technology 6–9 October 2020 (PIC S&T′2020). Kharkiv, Ukraine, 2020, pp. 467–472. DOI: 10.1109/PICST51311.2020.9468083

Downloads

Published

2025-12-24

How to Cite

Pastushenko, M. S. ., Pastushenko О. М. ., & Faizulaiev T. А. (2025). ANALYSIS OF PROCEDURES FOR VOICE SIGNAL NORMALIZATION AND SEGMENTATION IN INFORMATION SYSTEMS. Radio Electronics, Computer Science, Control, (4), 194–201. https://doi.org/10.15588/1607-3274-2025-4-17

Issue

Section

Progressive information technologies