ІНФОРМАЦІЙНА ТЕХНОЛОГІЯ РОЗПІЗНАВАННЯ ПРОПАГАНДИ, ФЕЙКІВ ТА ДЕЗІНФОРМАЦІЇ У ТЕКСТОВОМУ КОНТЕНТІ НА ОСНОВІ МЕТОДІВ NLP ТА МАШИННОГО НАВЧАННЯ

V. Vysotska

doi:10.15588/1607-3274-2024-2-13

Authors

V. Vysotska Lviv Polytechnic National University, Lviv, Ukraine, Ukraine

DOI:

https://doi.org/10.15588/1607-3274-2024-2-13

Keywords:

disinformation, fake, propaganda, linguistic analysis, natural language processing, machine learning, cyber warfare, artificial intelligence, semantic analysis, information security

Abstract

Context. The research is aimed at the application of artificial intelligence for the development and improvement of means of cyber warfare, in particular for combating disinformation, fakes and propaganda in the Internet space, identifying sources of disinformation and inauthentic behavior (bots) of coordinated groups. The implementation of the project will contribute to solving the important and currently relevant issue of information manipulation in the media, because in order to effectively fight against distortion and disinformation, it is necessary to obtain an effective tool for recognizing these phenomena in textual data in order to develop a further strategy to prevent the spread of such data.

Objective of the study is to develop or automatic recognition of political propaganda in textual data, which is built on the basis of machine learning with a teacher and implemented using natural language processing methods.

Method. Recognition of the presence of propaganda will occur at two levels: at the general level, that is, at the level of the document, and at the level of individual sentences. To implement the project, such feature construction methods as the TF-IDF statistical indicator, the “Bag of Words” vectorization model, the marking of parts of speech, the word2vec model for obtaining vector representations of words, as well as the recognition of trigger words (reinforcing words, absolute pronouns and “shiny” words). Logistic regression was used as the main modeling algorithm.

Results. Machine learning models have been developed to recognize propaganda, fakes and disinformation at the document (article) and sentence level. Both model scores are satisfactory, but the model for document-level propaganda recognition performed almost 1.2 times better (by 20%).

Conclusions. The created model shows excellent results in recognizing propaganda, fakes and disinformation in textual content based on NLP and machine learning methods. The analysis of the raw data showed that the propaganda recognition model at the document (article) level was able to correctly classify 6097 non-propaganda articles and 694 propaganda articles. 123 propaganda articles and 285 non-propaganda articles were misclassified. The obtained estimate of the model: 0.9433254618697041. The sentence-level propaganda recognition model successfully classified 205 propaganda articles and 1917 non-propaganda articles. The model score is: 0.7437784787942516 (but 731 articles were incorrectly classified).

Author Biography

V. Vysotska, Lviv Polytechnic National University, Lviv, Ukraine

Dr. Sc., Associate Professor of Information Systems and Networks Department

References

Zhao Y., Da J., Yan J. Detecting health misinformation in online health communities: Incorporating behavioral features into machine learning based approaches, Information Processing & Management, 2021, Vol. 58(1), P. 102390. DOI: 10.1016/j.ipm.2020.102390

Hartmann M., Golovchenko Y., Augenstein I. Mapping (dis-)information flow about the MH17 plane crash, arXiv. Access mode: https://arxiv.org/abs/1910.01363.

Prokipchuk O., Vysotska V. Ukrainian Language Tweets Analysis Technology for Public Opinion Dynamics Change Prediction Based on Machine Learning, Radio Electronics, Computer Science, Control, 2023, Vol. 2(2023), pp. 103– 116. DOI: 10.15588/1607-3274-2023-2-11

Ahmed S., Kumar A. Classification of Censored Tweets in Chinese Language using XLNet, Fourth Workshop on NLP for Internet Freedom. Censorship, Disinformation, and Propaganda, Association for Computational Linguistics, Online, 2021, proceedings. Online: ACL, 2021, pp. 136– 139. DOI: 10.18653/v1/2021.nlp4if-1.21

Vysotska V., Mazepa S., Chyrun L., Brodyak O., Shakleina I., Schuchmann V. NLP Tool for Extracting Relevant Information from Criminal Reports or Fakes/Propaganda Content, Computer Sciences and Information Technologies : 17th International Conference, Lviv, 2022, November. Lviv, IEEE, 2021, pp. 93–98. DOI: 10.1109/CSIT56902.2022.10000563

Oliinyk V. A., Vysotska V., Burov Y., Mykich K., Fernandes V. B. Propaganda Detection in Text Data Based on NLP and Machine Learning, CEUR Workshop Proceedings, 2020, Vol. 2631, pp. 132–144.

Bjola C. Propaganda in the digital age, Global Affairs, 2017, Vol. 3(3), pp. 189–191. DOI: 10.1080/23340460.2017.1427694

Vosoughi S., Roy D., Aral S. The spread of true and false news online, Science, 2018, Vol. 359(6380), pp. 1146–1151. DOI: 10.1126/science.aap9559

Propaganda Definitions. Access mode: https://propaganda.qcri.org/annotations/definitions.html

Field A. Kliger D., Wintner S., Pan J., Jurafsky D., Tsvetkov Y. Framing and Agenda-setting in Russian News: a Computational Analysis of Intricate Political Strategies, arXiv. Access mode: https://arxiv.org/abs/1808.09386

Garcia-Marín J., Calatrava A. The Use of Supervised Learning Algorithms in Political Communication and Media Studies: Locating Frames in the Press, Pamplona, 2018, Vol. 31(3), pp. 175–188. DOI: 10.15581/003.31.3.175-188

nginx. – Access mode: https://fgz.texty.org/

texty.org.ua. How Texty detects and makes sense of manipulative news. Access mode: https://medium.com/@texty.org.ua/how-texty-detects-andmakes-sense-of-manipulative-news-1f43d33936eb

Hein V. Propaganda detection in Russian and American news coverage about the war in Ukraine through text classification, Diploma Thesis, Technische Universität Wien, 2023. DOI: 10.34726/hss.2023.104640

Ceușan I. F. European Union policies and strategies to counter Russian propaganda and disinformation, L’Europe Unie, 2023, Vol. 19(19), pp. 113–122.

Perdoor S. Fake News Detection with LSTM and NLP – ProRew1. Access mode: https://www.kaggle.com/code/superrajdoor/fake-newsdetection-with-lstm-and-nlp-prorew1/input //

Duratnir İ. Fake News Detection with NLP and LSTM / İ. Duratnir. Access mode: https://www.kaggle.com/code/ilaydadu/fake-newsdetection-with-nlp-and-lstm

propaganda-detection-our-data. Access mode: https://www.kaggle.com/datasets/vladimirsydor/propaganda -detection-our-data

INFORMATION TECHNOLOGY FOR RECOGNIZING PROPAGANDA, FAKES AND DISINFORMATION IN TEXTUAL CONTENT BASED ON NLP AND MACHINE LEARNING METHODS

Authors

DOI:

Keywords:

Abstract

Author Biography

V. Vysotska, Lviv Polytechnic National University, Lviv, Ukraine

References

Downloads

Published

How to Cite

Issue

Section

License

Creative Commons Licensing Notifications in the Copyright Notices

Information

Current Issue

Announcements