THE METHOD OF ADAPTATION OF THE PARAMETERS OF ALGORITHMS FOR THE DETECTION AND CLEANING OF A STATISTICAL SAMPLE FROM ANOMALIES FOR DATA SCIENCE PROBLEMS

Authors

  • O. O. Pysarchuk National Technical University of Ukraine “Ihor Sikorskyi Kyiv Polytechnic Institute”, Ukraine
  • S. O. Pavlova National Technical University of Ukraine “Ihor Sikorskyi Kyiv Polytechnic Institute, Ukraine
  • D. R. Baran National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine

DOI:

https://doi.org/10.15588/1607-3274-2025-3-4

Keywords:

anomaly detection, dynamic error, statistical error, model optimization, Moving Window, Data Science, Big Data, time series

Abstract

Context. Popularization of the Data Science for the tasks of e-commerce, the banking sector of the economy, for the tasks of managing dynamic objects – all this actualizes the requirements for indicators of the efficiency of data processing in the Time Series format. This also applies to the preparatory stage of data analysis at the level of detection and cleaning of statistical samples from anomalies such as rough measurements and omissions.
Objective. The development of the method for adapting the parameters of the algorithms for detecting and cleaning the statistical sample of the Time Series format from anomalies for Data Science problems.
Method. The article proposes a method for adapting the parameters of algorithms for detecting and cleaning a statistical sample from anomalies for data science problems. The proposed approach is based on and differs from similar practices by the introduction of an optimization approach in minimizing the dynamic and statistical error of the model, which determines the parameters of settings of popular algorithms for cleaning the statistical sample from anomalies using the Moving Window Method.
Result. The introduction of the proposed approach into the practice of Data Science allows the development of software components for cleaning data from anomalies, which are trained by parameters purely according to the structure and dynamics of the Time Series.
Conclusions. The key advantage of the proposed method is its simple implementation into existing algorithms for clearing the sample from anomalies and the absence of the need for the developer to select parameters for the settings of the cleaning algorithms manually, which saves time during development. The effectiveness of the proposed method is confirmed by the results of calculations

Author Biographies

O. O. Pysarchuk , National Technical University of Ukraine “Ihor Sikorskyi Kyiv Polytechnic Institute”

Dr. Sc., Professor, Professor of the Department of Computer Engineering, Faculty of Informatics and Computing

S. O. Pavlova, National Technical University of Ukraine “Ihor Sikorskyi Kyiv Polytechnic Institute

Student of the Faculty of Informatics and Computing

D. R. Baran, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”

Assistant of the Department of Computer Engineering, Faculty of Informatics and Computing

References

Kumar J., Kumar A., Kumar R. Big Data and Analytics: The key concepts and practical applications of big data analytics. BPB Publications, 2024, 232 p.

Dietrich D., Heller B., Yang B. Data Science & Big Data Analytics: Discovering, Analyzing, Visualizing and Present- ing Data. Indianapolis, Indiana, John Wiley & Sons, 2015, 420 p.

Provost F., Fawcett T. Data Science for Business. New York: O’Reilly Media, Inc, 2013, 409 p.

Hastie T., Tibshirani R., Friedman J. The Elements of Statistical Learning Data Mining, Inference, and Prediction Second Edition. New York, Springer, 2017, 763 p.

Brockwell P. J., Davis R. A. Introduction to Time Series and Forecasting. New York, Springer, 2016,439 p.

Tuhanskykh O., Baran D., Pysarchuk O. Method for Statistical Evaluation of Nonlinear Model Parameters in Statistical Learning Algorithms, Proceedings of Ninth International Congress on Information and Communication Technology. – Springer, Singapore, 2024, No. 1013, pp. 265–274. (Series “Lecture Notes in Networks and Systems”). DOI: 10.1007/978-981-97-3559-4_21

Nassif A. B., Talib M. A., Nasir Q., Dakalbab F. M. Machine Learning for Anomaly Detection: A Systematic Review, IEEE Access, 2021, No. 9, pp. 78658–78700. DOI: 10.1145/3439950.

Pang G., Shen C., Cao L., Hengel A. Deep Learning for Anomaly Detection, ACM Computing Surveys, 2021, Vol. 54(2), pp. 1–38. DOI: 10.1145/3439950.

Song X., Wu M., Jermaine C., Ranka S. Conditional Anomaly Detection, IEEE Transactions on Knowledge and Data Engineering, 2007, No. 19, pp. 631–645. DOI: 10.1109/TKDE.2007.1009.

Pysarchuk O., Baran D., Mironov Y., Pysarchuk I. Algorithms of statistical anomalies clearing for data science applications, System research and information technologies. – 2023, No. 1, pp. 78–84. DOI: 10.20535/SRIT.2308- 8893.2023.1.06.

Mehrotra K. G., Mohan C. K., Huang H. Anomaly Detection Principles and Algorithms. Switzerland, Springer, 2017, 229 p.

McKinney W. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. O’Reilly Media, 2017, 550 p.

Nelli F. Python Data Analytics: With Pandas, NumPy, and Matplotlib, 2nd ed. Edition. Apress, 2018, 588 p.

Raschka S., Mirjalili V. Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow, Second Edition. Packt Publishing, 2017, 622 p.

Joshi P. Artificial Intelligence with Python: A Comprehensive Guide to Building Intelligent Apps for Python Beginners and Developers. Packt Publishing, 2017, 466 p.

Downloads

Published

2025-09-22

How to Cite

Pysarchuk , O. O. ., Pavlova, S. O., & Baran, D. R. . (2025). THE METHOD OF ADAPTATION OF THE PARAMETERS OF ALGORITHMS FOR THE DETECTION AND CLEANING OF A STATISTICAL SAMPLE FROM ANOMALIES FOR DATA SCIENCE PROBLEMS. Radio Electronics, Computer Science, Control, (3), 37–44. https://doi.org/10.15588/1607-3274-2025-3-4

Issue

Section

Mathematical and computer modelling