THE METHOD OF ADAPTATION OF THE PARAMETERS OF ALGORITHMS FOR THE DETECTION AND CLEANING OF A STATISTICAL SAMPLE FROM ANOMALIES FOR DATA SCIENCE PROBLEMS
DOI:
https://doi.org/10.15588/1607-3274-2025-3-4Keywords:
anomaly detection, dynamic error, statistical error, model optimization, Moving Window, Data Science, Big Data, time seriesAbstract
Context. Popularization of the Data Science for the tasks of e-commerce, the banking sector of the economy, for the tasks of managing dynamic objects – all this actualizes the requirements for indicators of the efficiency of data processing in the Time Series format. This also applies to the preparatory stage of data analysis at the level of detection and cleaning of statistical samples from anomalies such as rough measurements and omissions.
Objective. The development of the method for adapting the parameters of the algorithms for detecting and cleaning the statistical sample of the Time Series format from anomalies for Data Science problems.
Method. The article proposes a method for adapting the parameters of algorithms for detecting and cleaning a statistical sample from anomalies for data science problems. The proposed approach is based on and differs from similar practices by the introduction of an optimization approach in minimizing the dynamic and statistical error of the model, which determines the parameters of settings of popular algorithms for cleaning the statistical sample from anomalies using the Moving Window Method.
Result. The introduction of the proposed approach into the practice of Data Science allows the development of software components for cleaning data from anomalies, which are trained by parameters purely according to the structure and dynamics of the Time Series.
Conclusions. The key advantage of the proposed method is its simple implementation into existing algorithms for clearing the sample from anomalies and the absence of the need for the developer to select parameters for the settings of the cleaning algorithms manually, which saves time during development. The effectiveness of the proposed method is confirmed by the results of calculations
References
Kumar J., Kumar A., Kumar R. Big Data and Analytics: The key concepts and practical applications of big data analytics. BPB Publications, 2024, 232 p.
Dietrich D., Heller B., Yang B. Data Science & Big Data Analytics: Discovering, Analyzing, Visualizing and Present- ing Data. Indianapolis, Indiana, John Wiley & Sons, 2015, 420 p.
Provost F., Fawcett T. Data Science for Business. New York: O’Reilly Media, Inc, 2013, 409 p.
Hastie T., Tibshirani R., Friedman J. The Elements of Statistical Learning Data Mining, Inference, and Prediction Second Edition. New York, Springer, 2017, 763 p.
Brockwell P. J., Davis R. A. Introduction to Time Series and Forecasting. New York, Springer, 2016,439 p.
Tuhanskykh O., Baran D., Pysarchuk O. Method for Statistical Evaluation of Nonlinear Model Parameters in Statistical Learning Algorithms, Proceedings of Ninth International Congress on Information and Communication Technology. – Springer, Singapore, 2024, No. 1013, pp. 265–274. (Series “Lecture Notes in Networks and Systems”). DOI: 10.1007/978-981-97-3559-4_21
Nassif A. B., Talib M. A., Nasir Q., Dakalbab F. M. Machine Learning for Anomaly Detection: A Systematic Review, IEEE Access, 2021, No. 9, pp. 78658–78700. DOI: 10.1145/3439950.
Pang G., Shen C., Cao L., Hengel A. Deep Learning for Anomaly Detection, ACM Computing Surveys, 2021, Vol. 54(2), pp. 1–38. DOI: 10.1145/3439950.
Song X., Wu M., Jermaine C., Ranka S. Conditional Anomaly Detection, IEEE Transactions on Knowledge and Data Engineering, 2007, No. 19, pp. 631–645. DOI: 10.1109/TKDE.2007.1009.
Pysarchuk O., Baran D., Mironov Y., Pysarchuk I. Algorithms of statistical anomalies clearing for data science applications, System research and information technologies. – 2023, No. 1, pp. 78–84. DOI: 10.20535/SRIT.2308- 8893.2023.1.06.
Mehrotra K. G., Mohan C. K., Huang H. Anomaly Detection Principles and Algorithms. Switzerland, Springer, 2017, 229 p.
McKinney W. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. O’Reilly Media, 2017, 550 p.
Nelli F. Python Data Analytics: With Pandas, NumPy, and Matplotlib, 2nd ed. Edition. Apress, 2018, 588 p.
Raschka S., Mirjalili V. Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow, Second Edition. Packt Publishing, 2017, 622 p.
Joshi P. Artificial Intelligence with Python: A Comprehensive Guide to Building Intelligent Apps for Python Beginners and Developers. Packt Publishing, 2017, 466 p.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 O. O. Pysarchuk , S. O. Pavlova, D. R. Baran

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Creative Commons Licensing Notifications in the Copyright Notices
The journal allows the authors to hold the copyright without restrictions and to retain publishing rights without restrictions.
The journal allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles.
The journal allows to reuse and remixing of its content, in accordance with a Creative Commons license СС BY -SA.
Authors who publish with this journal agree to the following terms:
-
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License CC BY-SA that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
-
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
-
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.