INFORMATION TECHNOLOGY FOR DETECTION OF DISINFORMATION SOURCES AND INAUTHENTICAL BEHAVIOR OF CHAT USERS BASED ON NLP AND MACHINE LEARNING METHODS
DOI:
https://doi.org/10.15588/1607-3274-2025-3-13Keywords:
disinformation, source of disinformation, way of disinformation dissemination, disinformation dissemination network, fake, propaganda, natural language processing, stylistic analysisAbstract
Context. In the modern digital environment, the spread of disinformation and inauthentic behaviour of users in chat rooms poses a serious threat to society. Natural language processing and machine learning methods offer effective approaches to detecting and countering such threats.
Objective of the study is to develop information technology for automatically detecting the spread of sources of Ukrainian-language fake news and inauthentic behaviour of chat users, which is built using natural language processing methods and implemented, based on machine learning technologies.
Method. To implement the project, such feature construction methods as the TF-IDF statistical indicator, the Bag of Words vectorization model, and part-of-speech mark-up were used. For other experiments, the FastText, W2V, and Glove word2vecvectorization models were used to obtain vector representations of words, as well as to recognize trigger words (reinforcing words, absolute pronouns, and “shiny” words). The idea is to find similar messages in terms of text/meaning (lexical/semantical), as well as analyse the results of the distribution of similar messages in time and space. Complement Naïve Bayes, Gaussian Naïve Bayes, HistGradientBoostingClassifier, MultinomialNB and Random Forest were used as the main modelling algorithms to identify sources of disinformation and inauthentic chat behavior.
Results. This article discusses the development of software for detecting propaganda messages in social networks based on the analysis of Twitter text data. The main attention is paid to the methods of text pre-processing, data vectorization and machine learning for message classification. The process of collecting, preparing and cleaning data is described, and various approaches to training the model and evaluating its effectiveness are considered. 9 experiments were conducted for the selected methods of postprocessing data, vectorization models and modelling algorithms.
Conclusions. The created models show excellent results in recognizing sources of propaganda, fakes and disinformation in social networks and online media. The best results so far are shown by experiment 5 on the main TF-IDF + Complement Naïve Bayes. The high recall value for class 1 (0.8) means that the model finds positive samples well, but for class 0 it is less effective (0.56). The correspondingly high precision value for class 1 (0.89) means that most of the samples predicted as class 1 are correct. The low precision for class 0 (0.38) indicates a large number of false predictions. At the same time, certain anomalies are observed in the series of experiments (in particular, in experiment 7 based on Glove + Random Forest), which require further research. The results obtained can be used to further improve the algorithms for detecting sources of disinformation, inauthentic chat behaviour and malicious content to increase the country’s transparency.
References
Zhang Y., Shao Y., Zhang X., Wan W., Li J., Sun J. BERT Based Fake News Detection Model, Training, 2022, Vol. 1530, P. 383.
Cahyani D. E., Patasik I. Performance comparison of TF-IDF and word2vec models for emotion text classification, Bulletin of Electrical Engineering and Informatics, 2021, Vol. 10(5), pp. 2780–2788. DOI: 10.11591/eei.v10i5.3157
Bhosale S. Identifying Bots on Twitter with Benford’s Law. Access mode: https://scholarworks.sjsu.edu/etd_projects/1041/
Ghaemi Z., Farnaghi M. A Varied Density-based Clustering Approach for Event Detection from Heterogeneous Twitter Data, ISPRS International Journal of Geo-Information, 2019, 8(2), P. 82. DOI: 10.3390/ijgi8020082
Lazebnik T., Iny O. Temporal graphs anomaly emergence detection: benchmarking for social media interactions, Applied Intelligence, 2024, Vol. 54, pp. 12347–12356. DOI: 10.1007/s10489-024-05821-3
[Stieglitz S., Brachten F., Berthelé D., Schlaus M., Venetopoulou C., Veutgen D. Do Social Bots (Still) Act Different to Humans? -Comparing Metrics of Social Bots with Those of Humans, Lecture Notes in Computer Science, 2017, Vol. 10282, pp. 379–395. DOI: 10.1007/978-3-319-58559-8_30
Vysotska V. Information technology for recognizing propaganda, fakes and disinformation in textual content based on nlp and machine learning methods, Radio Electronics, Computer Science, Control, 2024, Vol. 2, P. 126. DOI: 10.15588/1607-3274-2024-2-13
Mokrytska O. V., Mochernyuk YU. M. Using machine learning algorithms to automate the content moderation process in messenger group chats, Scientific Bulletin of UNFU, 2024, Vol. 34(7), pp. 52–59. DOI: 10.36930/40340707
Dmytrotsa L. P., Datsyk S. V. Analysis of artificial intelligence tools for detecting disinformation in Facebook news, Information models, systems and technologies, 2023. pp. 35–36. Access mode: https://elartu.tntu.edu.ua/bitstream/lib/44384/2/IMSTT_2023_D mytrotsa_L_P-Analysis_of_artificial_35-36.pdf
Semenyuk A. V. Using machine learning and artificial intelligence methods to protect against social engineering in cyberattacks. Access mode: http://ir.lib.vntu.edu.ua/handle/123456789/41797
Martsenyuk M. S., Kozachok V. A., Bogdanov O., Brzhevska Z. M. Analysis of methods for detecting disinformation in social networks using machine learning, Cybersecurity: education, science, technology, 2023, Vol. 2(22), pp. 148–155. Access mode: https://elibrary.kubg.edu.ua/id/eprint/48271/
Dmytrotsa L. P., Datsyk S. V. Application of artificial intelligence methods to detect and counter disinformation on Facebook, Information models, systems and technologies, 2023, pp. 37–38. Access mode: https://elartu.tntu.edu.ua/bitstream/lib/44385/2/IMSTT_2023_D mytrotsa_L_P-Application_of_artificial_37-38.pdf
Sandu A., Cotfas L.-A., Delcea C., Ioanăș C., Florescu M.-S., Orzan M. Machine Learning and Deep Learning Applications in
Disinformation Detection: A Bibliometric Assessment, Electronics, 2024, Vol. 13(22), P. 4352. DOI: 10.3390/electronics13224352
Santos F. C. C. Artificial Intelligence in Automated Detection of Disinformation: A Thematic Analysis, Journalism and Media, 2023, Vol. 4(2), p. 679–687. DOI:10.3390/journalmedia4020043
Lakzaei B., Chehreghani Haghir M., Bagheri A. Disinformation detection using graph neural networks: a survey, Artificial Intelligence Review, 2024, Vol. 57, P. 52. DOI: 10.1007/s10462-024-10702-9
[Saeidnia H.R., Hosseini E., Lund B., Tehrani M. A., Zaker S., Molaei S. Artificial intelligence in the battle against disinformation and misinformation: a systematic review of challenges and approaches, Knowledge and Information Systems, 2025. DOI: 10.1007/s10115-024-02337-7
Akhtar P., Ghouri A. M., Khan H. U. R., Haq M. A., Awan U., Zahoor N., Khan Z., Ashraf A. Detecting fake news and disinformation using artificial intelligence and machine learning to avoid supply chain disruptions, Annals of Operations Research, 2023, Vol. 327, pp. 633–657. DOI: 10.1007/s10479- 022-05015-5
Vysotska V., Przystupa K., Chyrun L., Vladov S., Ushenko Y., Uhryn D., Hu Z. Disinformation, Fakes and Propaganda Identifying Methods in Online Messages Based on NLP and Machine Learning Methods, International Journal of Computer Network and Information Security(IJCNIS), 2024, Vol.16(5), pp. 57–85. DOI:10.5815/ijcnis.2024.05.06
Prokipchuk O., Vysotska V. Ukrainian Language Tweets Analysis Technology for Public Opinion Dynamics Change Prediction Based on Machine Learning, Radio Electronics, Computer Science, Control, 2023, № 2(65), pp. 103–116. DOI: 10.15588/1607-3274-2023-2-11
Vysotska V., Mazepa S., Chyrun L., Brodyak O., Shakleina I., Schuchmann V. NLP Tool for Extracting Relevant Information from Criminal Reports or Fakes/Propaganda Content, Computer Sciences and Information Technologies : 17th International Conference, Lviv, 2022, November. Lviv, IEEE, 2021, pp. 93– 98. DOI: 10.1109/CSIT56902.2022.10000563
Vysotska V., Chyrun L., Chyrun S., Holets I. Information technology for identifying disinformation sources and inauthentic chat users’ behaviours based on machine learning, CEUR Workshop Proceedings, 2024, Vol. 3723, pp. 466–483.
Iosifov E., Sokolov V. Comparative analysis of methods, technologies, services and platforms for voice information recognition in information security systems, Cybersecurity: education, science, technology, 2024, Vol. 1(25), pp. 468–486. DOI: 10.28925/2663-4023.2024.25.468486
Martsenyuk M., Kozachok V., Bogdanov O., Iosifov E., Brzhevska Z. Analysis of methods for detecting disinformation in social networks using machine learning, Cybersecurity: education, science, technology, 2023, Vol. 2(22), pp. 148–155. DOI: 10.28925/2663-4023.2023.22.148155
Komar M., Lipyanina-Honcharenko H., Kit I., Madarash R., Yurkiv H. An intellectual method for identifying sources of multilingual disinformation, Measuring and computing devices in technological processes, 2023, Vol. 2, pp. 221–230. DOI:10.31891/2219-9365-2023-74-31
Prytula M., Olenych I.Detection of aggressive rhetoric in text using machine learning algorithms, Electronics and information technologies, 2023, Vol. 22. DOI: 10.30970/eli.22.4
Islam M.R., Liu S., Wang X., Xu G.Deep learning for misinformation detection on online social networks: a survey and new perspectives, Social Network Analysis and Mining, 2020, Vol. 10, P. 82. DOI: 10.1007/s13278-020-00696-x
Cartwright B., Frank R., Weir G., Padda K. Detecting and responding to hostile disinformation activities on social media using machine learning and deep neural networks, Neural Computing and Applications, 2022, Vol.34, pp. 15141–15163.DOI: 10.1007/s00521-022-07296-0
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 V. Vysotska

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Creative Commons Licensing Notifications in the Copyright Notices
The journal allows the authors to hold the copyright without restrictions and to retain publishing rights without restrictions.
The journal allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles.
The journal allows to reuse and remixing of its content, in accordance with a Creative Commons license СС BY -SA.
Authors who publish with this journal agree to the following terms:
-
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License CC BY-SA that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
-
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
-
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.