ІНФОРМАЦІЙНА ТЕХНОЛОГІЯ ВИЯВЛЕННЯ ДЖЕРЕЛ ДЕЗІНФОРМАЦІЇ ТА НЕАВТЕНТИЧНОЇ ПОВЕДІНКИ КОРИСТУВАЧІВ ЧАТІВ НА ОСНОВІ МЕТОДІВ NLP ТА МАШИННОГО НАВЧАННЯ

V.  Vysotska

doi:10.15588/1607-3274-2025-3-13

Authors

V. Vysotska Lviv Polytechnic National University, Lviv, Ukraine, Ukraine

DOI:

https://doi.org/10.15588/1607-3274-2025-3-13

Keywords:

disinformation, source of disinformation, way of disinformation dissemination, disinformation dissemination network, fake, propaganda, natural language processing, stylistic analysis

Abstract

Context. In the modern digital environment, the spread of disinformation and inauthentic behaviour of users in chat rooms poses a serious threat to society. Natural language processing and machine learning methods offer effective approaches to detecting and countering such threats.
Objective of the study is to develop information technology for automatically detecting the spread of sources of Ukrainian-language fake news and inauthentic behaviour of chat users, which is built using natural language processing methods and implemented, based on machine learning technologies.
Method. To implement the project, such feature construction methods as the TF-IDF statistical indicator, the Bag of Words vectorization model, and part-of-speech mark-up were used. For other experiments, the FastText, W2V, and Glove word2vecvectorization models were used to obtain vector representations of words, as well as to recognize trigger words (reinforcing words, absolute pronouns, and “shiny” words). The idea is to find similar messages in terms of text/meaning (lexical/semantical), as well as analyse the results of the distribution of similar messages in time and space. Complement Naïve Bayes, Gaussian Naïve Bayes, HistGradientBoostingClassifier, MultinomialNB and Random Forest were used as the main modelling algorithms to identify sources of disinformation and inauthentic chat behavior.
Results. This article discusses the development of software for detecting propaganda messages in social networks based on the analysis of Twitter text data. The main attention is paid to the methods of text pre-processing, data vectorization and machine learning for message classification. The process of collecting, preparing and cleaning data is described, and various approaches to training the model and evaluating its effectiveness are considered. 9 experiments were conducted for the selected methods of postprocessing data, vectorization models and modelling algorithms.
Conclusions. The created models show excellent results in recognizing sources of propaganda, fakes and disinformation in social networks and online media. The best results so far are shown by experiment 5 on the main TF-IDF + Complement Naïve Bayes. The high recall value for class 1 (0.8) means that the model finds positive samples well, but for class 0 it is less effective (0.56). The correspondingly high precision value for class 1 (0.89) means that most of the samples predicted as class 1 are correct. The low precision for class 0 (0.38) indicates a large number of false predictions. At the same time, certain anomalies are observed in the series of experiments (in particular, in experiment 7 based on Glove + Random Forest), which require further research. The results obtained can be used to further improve the algorithms for detecting sources of disinformation, inauthentic chat behaviour and malicious content to increase the country’s transparency.

Author Biography

V. Vysotska, Lviv Polytechnic National University, Lviv, Ukraine

PhD, Professor of Information Systems and Networks Department

References

Zhang Y., Shao Y., Zhang X., Wan W., Li J., Sun J. BERT Based Fake News Detection Model, Training, 2022, Vol. 1530, P. 383.

Cahyani D. E., Patasik I. Performance comparison of TF-IDF and word2vec models for emotion text classification, Bulletin of Electrical Engineering and Informatics, 2021, Vol. 10(5), pp. 2780–2788. DOI: 10.11591/eei.v10i5.3157

Bhosale S. Identifying Bots on Twitter with Benford’s Law. Access mode: https://scholarworks.sjsu.edu/etd_projects/1041/

Ghaemi Z., Farnaghi M. A Varied Density-based Clustering Approach for Event Detection from Heterogeneous Twitter Data, ISPRS International Journal of Geo-Information, 2019, 8(2), P. 82. DOI: 10.3390/ijgi8020082

Lazebnik T., Iny O. Temporal graphs anomaly emergence detection: benchmarking for social media interactions, Applied Intelligence, 2024, Vol. 54, pp. 12347–12356. DOI: 10.1007/s10489-024-05821-3

[Stieglitz S., Brachten F., Berthelé D., Schlaus M., Venetopoulou C., Veutgen D. Do Social Bots (Still) Act Different to Humans? -Comparing Metrics of Social Bots with Those of Humans, Lecture Notes in Computer Science, 2017, Vol. 10282, pp. 379–395. DOI: 10.1007/978-3-319-58559-8_30

Vysotska V. Information technology for recognizing propaganda, fakes and disinformation in textual content based on nlp and machine learning methods, Radio Electronics, Computer Science, Control, 2024, Vol. 2, P. 126. DOI: 10.15588/1607-3274-2024-2-13

Mokrytska O. V., Mochernyuk YU. M. Using machine learning algorithms to automate the content moderation process in messenger group chats, Scientific Bulletin of UNFU, 2024, Vol. 34(7), pp. 52–59. DOI: 10.36930/40340707

Dmytrotsa L. P., Datsyk S. V. Analysis of artificial intelligence tools for detecting disinformation in Facebook news, Information models, systems and technologies, 2023. pp. 35–36. Access mode: https://elartu.tntu.edu.ua/bitstream/lib/44384/2/IMSTT_2023_D mytrotsa_L_P-Analysis_of_artificial_35-36.pdf

Semenyuk A. V. Using machine learning and artificial intelligence methods to protect against social engineering in cyberattacks. Access mode: http://ir.lib.vntu.edu.ua/handle/123456789/41797

Martsenyuk M. S., Kozachok V. A., Bogdanov O., Brzhevska Z. M. Analysis of methods for detecting disinformation in social networks using machine learning, Cybersecurity: education, science, technology, 2023, Vol. 2(22), pp. 148–155. Access mode: https://elibrary.kubg.edu.ua/id/eprint/48271/

Dmytrotsa L. P., Datsyk S. V. Application of artificial intelligence methods to detect and counter disinformation on Facebook, Information models, systems and technologies, 2023, pp. 37–38. Access mode: https://elartu.tntu.edu.ua/bitstream/lib/44385/2/IMSTT_2023_D mytrotsa_L_P-Application_of_artificial_37-38.pdf

Sandu A., Cotfas L.-A., Delcea C., Ioanăș C., Florescu M.-S., Orzan M. Machine Learning and Deep Learning Applications in

Disinformation Detection: A Bibliometric Assessment, Electronics, 2024, Vol. 13(22), P. 4352. DOI: 10.3390/electronics13224352

Santos F. C. C. Artificial Intelligence in Automated Detection of Disinformation: A Thematic Analysis, Journalism and Media, 2023, Vol. 4(2), p. 679–687. DOI:10.3390/journalmedia4020043

Lakzaei B., Chehreghani Haghir M., Bagheri A. Disinformation detection using graph neural networks: a survey, Artificial Intelligence Review, 2024, Vol. 57, P. 52. DOI: 10.1007/s10462-024-10702-9

[Saeidnia H.R., Hosseini E., Lund B., Tehrani M. A., Zaker S., Molaei S. Artificial intelligence in the battle against disinformation and misinformation: a systematic review of challenges and approaches, Knowledge and Information Systems, 2025. DOI: 10.1007/s10115-024-02337-7

Akhtar P., Ghouri A. M., Khan H. U. R., Haq M. A., Awan U., Zahoor N., Khan Z., Ashraf A. Detecting fake news and disinformation using artificial intelligence and machine learning to avoid supply chain disruptions, Annals of Operations Research, 2023, Vol. 327, pp. 633–657. DOI: 10.1007/s10479- 022-05015-5

Vysotska V., Przystupa K., Chyrun L., Vladov S., Ushenko Y., Uhryn D., Hu Z. Disinformation, Fakes and Propaganda Identifying Methods in Online Messages Based on NLP and Machine Learning Methods, International Journal of Computer Network and Information Security(IJCNIS), 2024, Vol.16(5), pp. 57–85. DOI:10.5815/ijcnis.2024.05.06

Prokipchuk O., Vysotska V. Ukrainian Language Tweets Analysis Technology for Public Opinion Dynamics Change Prediction Based on Machine Learning, Radio Electronics, Computer Science, Control, 2023, № 2(65), pp. 103–116. DOI: 10.15588/1607-3274-2023-2-11

Vysotska V., Mazepa S., Chyrun L., Brodyak O., Shakleina I., Schuchmann V. NLP Tool for Extracting Relevant Information from Criminal Reports or Fakes/Propaganda Content, Computer Sciences and Information Technologies : 17th International Conference, Lviv, 2022, November. Lviv, IEEE, 2021, pp. 93– 98. DOI: 10.1109/CSIT56902.2022.10000563

Vysotska V., Chyrun L., Chyrun S., Holets I. Information technology for identifying disinformation sources and inauthentic chat users’ behaviours based on machine learning, CEUR Workshop Proceedings, 2024, Vol. 3723, pp. 466–483.

Iosifov E., Sokolov V. Comparative analysis of methods, technologies, services and platforms for voice information recognition in information security systems, Cybersecurity: education, science, technology, 2024, Vol. 1(25), pp. 468–486. DOI: 10.28925/2663-4023.2024.25.468486

Martsenyuk M., Kozachok V., Bogdanov O., Iosifov E., Brzhevska Z. Analysis of methods for detecting disinformation in social networks using machine learning, Cybersecurity: education, science, technology, 2023, Vol. 2(22), pp. 148–155. DOI: 10.28925/2663-4023.2023.22.148155

Komar M., Lipyanina-Honcharenko H., Kit I., Madarash R., Yurkiv H. An intellectual method for identifying sources of multilingual disinformation, Measuring and computing devices in technological processes, 2023, Vol. 2, pp. 221–230. DOI:10.31891/2219-9365-2023-74-31

Prytula M., Olenych I.Detection of aggressive rhetoric in text using machine learning algorithms, Electronics and information technologies, 2023, Vol. 22. DOI: 10.30970/eli.22.4

Islam M.R., Liu S., Wang X., Xu G.Deep learning for misinformation detection on online social networks: a survey and new perspectives, Social Network Analysis and Mining, 2020, Vol. 10, P. 82. DOI: 10.1007/s13278-020-00696-x

Cartwright B., Frank R., Weir G., Padda K. Detecting and responding to hostile disinformation activities on social media using machine learning and deep neural networks, Neural Computing and Applications, 2022, Vol.34, pp. 15141–15163.DOI: 10.1007/s00521-022-07296-0

INFORMATION TECHNOLOGY FOR DETECTION OF DISINFORMATION SOURCES AND INAUTHENTICAL BEHAVIOR OF CHAT USERS BASED ON NLP AND MACHINE LEARNING METHODS

Authors

DOI:

Keywords:

Abstract

Author Biography

V. Vysotska, Lviv Polytechnic National University, Lviv, Ukraine

References

Downloads

Published

How to Cite

Issue

Section

License

Creative Commons Licensing Notifications in the Copyright Notices

Information

Current Issue

Announcements