A STUDY ON THE USE OF NORMALIZED L2-METRIC IN CLASSIFICATION TASKS
DOI:
https://doi.org/10.15588/1607-3274-2025-2-9Keywords:
normalized Euclidean distance, machine learning, classification, t-SNE, similarity measures, k-Nearest NeighborsAbstract
Context. In machine learning, similarity measures, and distance metrics are pivotal in tasks like classification, clustering, and dimensionality reduction. The effectiveness of traditional metrics, such as Euclidean distance, can be limited when applied to complex datasets. The object of the study is the processes of data classification and dimensionality reduction in machine learning tasks, in particular, the use of metric methods to assess the similarity between objects.
Objective. The study aims to evaluate the feasibility and performance of a normalized L2-metric (Normalized Euclidean Distance, NED) for improving the accuracy of machine learning algorithms, specifically in classification and dimensionality reduction.
Method. We prove mathematically that the normalized L2-metric satisfies the properties of boundedness, scale invariance, and monotonicity. It is shown that NED can be interpreted as a measure of dissimilarity of feature vectors. Its integration into k-nearest neighbors and t-SNE algorithms is investigated using a high-dimensional Alzheimer’s disease dataset. The study implemented four models combining different approaches to classification and dimensionality reduction. Model M1 utilized the k-nearest neighbors method with Euclidean distance without dimensionality reduction, serving as a baseline; Model M2 employed the normalized L2-metric in kNN; Model M3 integrated t-SNE for dimensionality reduction followed by kNN based on Euclidean distance; and Model M4 combined t-SNE and the normalized L2-metric for both reduction and classification stages. A hyperparameter optimization prоcedure was implemented for all models, including the number of neighbors, voting type, and the perplexity parameter for t-SNE. Cross-validation was conducted on five folds to evaluate classification quality objectively. Additionally, the impact of data normalization on model accuracy was examined.
Results. Models using NED consistently outperformed models based on Euclidean distance, with the highest classification accuracy of 91.4% achieved when it was used in t-SNE and the nearest neighbor method (Model M4). This emphasizes the adaptability of NED to complex data structures and its advantage in preserving key features in high and low-dimensional spaces.
Conclusions. The normalized L2-metric shows potential as an effective measure of dissimilarity for machine learning tasks. It improves the performance of algorithms while maintaining scalability and robustness, which indicates its suitability for various applications in high-dimensional data contexts.
References
Deza M. M., Deza E. Encyclopedia of Distances, Encyclopedia of Distances. Berlin, Heidelberg, Springer, 2009. DOI: 10.1007/978-3-642-00234-2_1.
Mathisen B. M., Aamodt A., Bach K., Langseth H. Learning similarity measures from data, Progress in Artificial Intelligence, 2019, Vol. 9, pp. 129–143. DOI: 10.1007/s13748-019-00201-2.
Vangipuram S. K., Appusamy R. A survey on similarity measures and machine learning algorithms for classification and prediction, International Conference on Data Science, E-learning and Information Systems 2021 (DATA'21): Proceedings. – Association for Computing Machinery, 2021, pp. 198–204. DOI: 10.1145/3460620.3460755.
Kondruk N. E. Methods for determining similarity of categorical ordered data, Radio Electronics, Computer Science, Control, 2023, Vol. 65, No. 2, pp. 31–36. DOI: 10.15588/1607-3274-2023-2-4.
Kondruk N. E. Use of length-based similarity measure in clustering problems, Radio Electronics, Computer Science, Control, 2018, Vol. 46, No. 3, pp. 98–105. DOI: 10.15588/1607-3274-2018-3-11.
Kondruk N. E., Malyar M. M. Analysis of Cluster Structures by Different Similarity Measures, Cybernetics and Systems Analysis, 2021, Vol. 57, pp. 436–441. DOI: 10.1007/s10559-021-00368-4.
Vital A., Amancio D. R. A comparative analysis of local similarity metrics and machine learning approaches: application to link prediction in author citation networks, Scientometrics, 2022, Vol. 127, pp. 6011–6028. DOI: 10.1007/s11192-022-04484-6.
Radisic I., Lazarevic S., Antović I., Stanojevic V. Evaluation of Predictive Capabilities of Similarity Metrics in Machine Learning, 2020 24th International Conference on Information Technology (IT), 2020, pp. 1–4. DOI: 10.1109/IT48810.2020.9070437.
Blanco-Mallo E., Morán-Fernández L., Remeseiro B., Bolón-Canedo V. Do all roads lead to Rome? Studying distance measures in the context of machine learning, Pattern Recognition, 2023, Vol. 141. Article ID: 109646. DOI: 10.1016/j.patcog.2023.109646.
Pulungan A. F., Zarlis M., Suwilo S. Analysis of Braycurtis, Canberra and Euclidean Distance in KNN Algorithm, Sinkron: Jurnal dan Penelitian Teknik Informatika, 2019, Vol. 4, № 1, pp. 74–77. DOI: 10.33395/sinkron.v4i1.10207.
Sandhu G., Singh A., Lamba P. S., Virmani D., Chaudhary G. Modified Euclidean-Canberra blend distance metric for kNN classifier, Intelligent Decision Technologies, 2023. Vol. 17, № 2, pp. 527–541. DOI: 10.3233/IDT-220233.
Cilia N. D., De Stefano C., Fontanella F., Di Freca A. S. An experimental protocol to support cognitive impairment diagnosis by using handwriting analysis, Procedia Computer Science, 2018, Vol. 141, pp. 466–471. DOI: 10.24432/C55D0K.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 N. E Kondruk

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Creative Commons Licensing Notifications in the Copyright Notices
The journal allows the authors to hold the copyright without restrictions and to retain publishing rights without restrictions.
The journal allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles.
The journal allows to reuse and remixing of its content, in accordance with a Creative Commons license СС BY -SA.
Authors who publish with this journal agree to the following terms:
-
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License CC BY-SA that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
-
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
-
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.