ANALYSIS OF PROCEDURES FOR VOICE SIGNAL NORMALIZATION AND SEGMENTATION IN INFORMATION SYSTEMS
DOI:
https://doi.org/10.15588/1607-3274-2025-4-17Keywords:
authentication, voice signal, normalization, segmentation, formant dataAbstract
Context. The current task of evaluating formant data (formant frequencies, their spectral density level, amplitude-frequency spectrum envelope, formant frequency spectrum width) in voice authentication systems is considered. The object of the study is the process of digital preprocessing of the voice signal when extracting formant data.
Objective. Evaluation of the effectiveness of traditional procedures for digital preprocessing of a user voice signal and development of proposals for improving the quality of formant data extraction.
Method. A mathematical model for extracting formant data from an experimental voice signal has been developed to study the influence of normalization and segmentation procedures on the quality of the resulting estimates. By modeling the process of extracting formant data, the results of digital processing of normalized and non-normalized voice signals are compared. The influence of the processed frame duration of the experimental voice signal on the quality of the formant frequencies assessment is estimated. The results are obtained for the experimental phoneme and morpheme.
Results. The obtained results show that when processing a voice signal with a sufficient signal-to-noise ratio, normalization procedures are not mandatory when extracting formant data. Moreover, normalization leads to a less accurate measurement of the spectrum width of formant frequencies. It is also unacceptable to use a processed frame duration of less than 40 ms. These results allow us to modify the traditional method of voice signal preprocessing. The use of the modeling method in the study of the experimental voice signal confirms the reliability of the results obtained.
Conclusions. The scientific novelty of the research results lies in the modification of the voice signal preprocessing methodology in authentication systems. Eliminating normalization procedures at high signal-to-noise ratios of the voice signal, which occurs in user authentication systems, makes it possible to increase the speed of formant data extraction and more accurately estimate the width of the formant frequency spectrum. Selecting a frame duration of at least 40 ms for the processed signal significantly improves the accuracy of formant frequency determination. Otherwise, the estimates of the formant frequencies will be high. Moreover, when processing phonemes, the processed voice signal cannot be divided into frames. Practical application of research results allows to increase the efficiency and accuracy of the formant data generation. Prospects for further research may be studies of the influence of normalization and framing procedures on other elements of a template of the authentication system user.
References
Pastushenko M. O., Pastushenko M. S., Petrachenko M. O. Do pytannja ocinky efektyvnosti biometrychnyh system, Problemy telekomunikacij, 2023, № 1(32), pp. 37–44. DOI: https://doi.org/10.30837/pt.2023.1.03
Rabiner L. R., Schafer R. W. Digital Processing of Speech Signals. NJ, Prentice-Hall, Inc., 1978, 512 p. URL: https://ie.uryukyu.ac.jp/~asharif/pukiwiki/attach/Acoustic%20Speech%20 Signal%20Processing_Prentice%20Hall%20 %20Digital%20Processing%20of%20Speech%20Signals.pdf
Beigi H. Fundamentals of Speaker Recognition. NY, Springer, 2011, 942 p. DOI:10.1007/978-0-387-77592-0
Persson A., Barreda S., Jaeger T. F. Comparing normalization against US English listeners’vomel perception, The Journal of the Acoustical Society of America, 2025, Vol. 157, № 2, pp. 1458–1482. DOI: //doi.org/10.1121/10.0035476
Clopper C. G., Dossey E., Gonzalez R. Raw acoustic vs. normalized phonetic convergence: Imitation of the Northern Cities Shift in the American Midwest, Laboratory Phonology, 2024, Vol. 15(1), pp. 1–15. DOI: https://doi.org/10.16995/labphon.10893
Anikin A., Barreda S., Reby D. A Practical guide to calculating vocal tract length and scale-invariant formant patterns, Springer Nature Link, 2023, Vol. 56, pp. 5588–5604. DOI: 10.3758/s13428-023-02288-x
Almaadeed N. Aggoun, A., Amira A. Text-Independent Speaker Identification Using Vowel Formants, Journal of Signal Processing Systems, 2015, Vol. 82, № 3, pp. 345 – 356. DOI: https://doi.org/10.1007/s11265-015-1005-5
Aggarwal S., Vasukidevi G., Selvakanmani S., Pant B., Kaur K., Verma A., Binegde G. N. Audio Segmentation Techniques and Applications Based on Deep Learning, Journal of Scientific Programming. Wiley Online Library, 2022, Vol. 2022, pp. 1–9. DOI: https://doi.org/10.1155/2022/7994191
Lebourdais M., Mariotte T., Almudévar A., Tahon M., Ortega A. Explainable by-design Audio Segmentation through Non-Negative Matrix Factorization and Probing, arXiv:2406.13385 [eess.AS], 2024, pp. 1–5. DOI: https://doi.org/10.48550/arXiv.2406.13385
Lee J., Kim S., Kim H., Chung J. S. Lightweight Audio Segmentation for Long-form Speech Translation, arXiv:2406.10549 [eess.AS], 2024, pp. 1–5. DOI: https://doi.org/10.48550/arXiv.2406.10549
Pastushenko M., Krasnozheniuk Ya., Zaika M. Investigation of Informativeness and Stability of Mel-Frequency Cepstral Coefficients Estimates based on Voice Signal Phase Data of Authentication System User, International Conference Problems of Infocommunications. Science and Technology 6–9 October 2020 (PIC S&T′2020). Kharkiv, Ukraine, 2020, pp. 467–472. DOI: 10.1109/PICST51311.2020.9468083
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 M. S. Pastushenko, О. М. Pastushenko, T. А. Faizulaiev

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Creative Commons Licensing Notifications in the Copyright Notices
The journal allows the authors to hold the copyright without restrictions and to retain publishing rights without restrictions.
The journal allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles.
The journal allows to reuse and remixing of its content, in accordance with a Creative Commons license СС BY -SA.
Authors who publish with this journal agree to the following terms:
-
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License CC BY-SA that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
-
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
-
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) as it can lead to productive exchanges, as well as earlier and greater citation of published work.