RESEARCH ON A HYBRID LSTM-CNN-ATTENTION MODEL FOR TEXTBASED WEB CONTENT CLASSIFICATION

Authors

  • M. V. Kuz Vasyl Stefanyk Precarpathian National University, Ivano-Frankivsk, Ukraine, Ukraine
  • I. M. Lazarovych Vasyl Stefanyk Precarpathian National University, Ivano-Frankivsk, Ukraine, Ukraine
  • M. I. Kozlenko Vasyl Stefanyk Precarpathian National University, Ivano-Frankivsk, Ukraine, Ukraine
  • M. V. Pikuliak Vasyl Stefanyk Precarpathian National University, Ivano-Frankivsk, Ukraine
  • A. D. Kvasniuk Vasyl Stefanyk Precarpathian National University, Ivano-Frankivsk, Ukraine

DOI:

https://doi.org/10.15588/1607-3274-2025-4-10

Keywords:

web content classification, LSTM-CNN-Attention, deep learning, natural language processing, GloVe embeddings, text classification, hybrid model, sequence modeling

Abstract

Context. Text-based web content classification plays a pivotal role in various natural language processing (NLP) tasks, including fake news detection, spam filtering, content categorization, and automated moderation. As the scale and complexity of textual data on the web continue to grow, traditional classification approaches – especially those relying on manual feature engineering or shallow learning techniques – struggle to capture the nuanced semantic relationships and structural variability of modern web content. These limitations result in reduced adaptability and poor generalization performance on real-world data. Therefore, there is a clear need for advanced models that can simultaneously learn local linguistic patterns and understand the broader contextual meaning of web text.
Method. This study presents a hybrid deep learning architecture that integrates Long Short-Term Memory (LSTM) networks, Convolutional Neural Networks (CNN), and an Attention mechanism to enhance the classification of web content based on text. Pretrained GloVe embeddings are used to represent words as dense vectors that preserve semantic similarity. The CNN layer extracts local n-gram patterns and lexical features, while the LSTM layer models long-range dependencies and sequential structure. The integrated Attention mechanism enables the model to focus selectively on the most informative parts of the input sequence. The model was evaluated using the dataset, which consists of over 10,000 HTML-based web pages annotated as legitimate or fake. A 5-fold cross-validation setup was used to assess the robustness and generalizability of the proposed solution.
Results. Experimental results show that the hybrid LSTM-CNN-Attention model achieved outstanding performance, with an accuracy of 0.98, precision of 0.94, recall of 0.92, and F1-score of 0.93. These results surpass the performance of baseline models based solely on CNNs, LSTMs, or transformer-based classifiers such as BERT. The combination of neural network components enabled the model to effectively capture both fine-grained text structures and broader semantic context. Furthermore, the use of GloVe embeddings provided an efficient and effective representation of textual data, making the model suitable for integration into systems with real-time or near-real-time requirements.
Conclusions. The proposed hybrid architecture demonstrates high effectiveness in text-based web content classification, particularly in tasks requiring both syntactic feature extraction and semantic interpretation. By combining convolutional, recurrent, and attention-based mechanisms, the model addresses the limitations of individual architectures and achieves improved generalization. These findings support the broader use of hybrid deep learning approaches in NLP applications, especially where complex, unstructured textual data must be processed and classified with high reliability.

Author Biographies

M. V. Kuz , Vasyl Stefanyk Precarpathian National University, Ivano-Frankivsk, Ukraine

Dr. Sc., Professor of the Department of Information Technology

I. M. Lazarovych, Vasyl Stefanyk Precarpathian National University, Ivano-Frankivsk, Ukraine

PhD, Associate Professor of the Department of Information Technology

M. I. Kozlenko, Vasyl Stefanyk Precarpathian National University, Ivano-Frankivsk, Ukraine

PhD, Associate Professor of the Department of Information Technology

M. V. Pikuliak, Vasyl Stefanyk Precarpathian National University, Ivano-Frankivsk

PhD, Associate Professor of the Department of Information Technology

A. D. Kvasniuk, Vasyl Stefanyk Precarpathian National University, Ivano-Frankivsk

Graduate student

References

Evans K., Abuadbba A., Wu T., Moore K., Ahmed M., Pogrebna G., Nepal S., Johnstone M. RAIDER: Reinforcementaided Spear Phishing Detector, Network and System Security, 2022, Vol. 13787, pp. 23–59. DOI: https://doi.org/10.1007/978-3-031-23020-2_2.

Kozlenko M., Lazarovych I., Tkachuk V., Kuz M. Deep Learning Demodulation of Amplitude Noise Shift Keying Spread Spectrum Signals, IEEE International Conference on Problems of Infocommunications. Science and Technology (PIC S&T). Kharkiv, Ukraine, 2020, pp. 717–720. DOI: https://doi.org/10.1109/PICST51311.2020.9468063.

Dhar A., Mukherjee H., Roy K., Santosh K.C., Dash N.S. Hybrid approach for text categorization: A case study with Bangla news article, Journal of Information Science, 2023, Vol. 49(3), pp. 762–777. DOI: https://doi.org/10.1177/01655515211027770.

Vörös T., Bergeron S.P., Berlin K. Web Content Filtering through knowledge distillation of Large Language Models [Electronic resource]. Access mode: https://doi.org/10.48550/arXiv.2305.05027.

Verma P., Goyal A., Gigras Y. Email phishing: Text classification using natural language processing, Computer Science and Information Technologies, 2020, Vol. 1(1), pp. 1–12. DOI: https://doi.org/10.11591/csit.v1i1.p1-12.

Somesha M., Alwyn R. Pais. Classification of Phishing Email Using Word Embedding and Machine Learning Techniques, Journal of Cyber Security and Mobility, 2022, Vol. 11(3), pp. 279–320. DOI: https://doi.org/10.13052/jcsm2245-1439.1131.

Asudani D. S., Nagwani N. K. & Singh P. Impact of word embedding models on text analytics in deep learning environment: a review, Artificial Intelligence Review, 2023 Vol. 56, pp. 10345–10425. DOI: https://doi.org/10.1007/s10462-023-10419-1.

Petridis C. Text Classification: Neural Networks VS Machine Learning Models VS Pre-trained Models [Electronic resource]. Access mode:

https://doi.org/10.48550/arXiv.2412.21022xiv.org/abs/2412.21022.

Gurumurthy M., Chitra K. Email Phishing Detection Model using CNN Model, Journal of Innovation and Technology, 2024, Vol. 2024(43), pp. 1–8. DOI: https://doi.org/10.61453/joit.v2024no43.

Kozlenko M., Lazarovych I., Tkachuk V., Vialkova V. Software Demodulation of Weak Radio Signals Using Convolutional Neural Network, IEEE 7th International Conference on Energy Smart Systems (ESS). Kyiv, Ukraine, 2020, pp. 339–342. DOI: https://doi.org/10.1109/ESS50319.2020.9160035.

Benavides-Astudillo E., Fuertes W., Sanchez-Gordon S., Nu-ñez-Agurto D., Rodríguez-Galán G. A Phishing-AttackDetection Model Using Natural Language Processing and Deep Learning, Applied Sciences, 2023, Vol. 13(9), P. 5275. DOI: https://doi.org/10.3390/app13095275.

Maurer M.E. Phishload [Electronic resource]. Access mode: https://www.medien.ifi.lmu.de/team/max.maurer/files/phishload.

Tawil A.A., Almazaydeh L., Qawasmeh D., Qawasmeh B., Alshinwan M., Elleithy K. Comparative Analysis of Machine Learning Algorithms for Email Phishing Detection Using TFIDF, Word2Vec, and BERT, Computers, Materials & Continua, 2024, Vol. 81(2), pp. 3395 3412. DOI: https://doi.org/10.32604/cmc.2024.057279.

Camacho-Collados J., Pilehvar M.T. On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis [Electronic resource]. Access mode: https://doi.org/10.48550/arXiv.1707.01780.

Joshi A., Lloyd L., Westin P., Seethapathy S. Using Lexical Features for Malicious URL Detection – A Machine Learning Approach [Electronic resource]. Access mode: https://doi.org/10.48550/arXiv.1910.06277org/abs/1910.06277.

Giorgi J., Nitski O., Wang B., Bader G. DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations [Electronic resource]. Access mode: https://doi.org/10.48550/arXiv.2006.03659.

Priya R., Sujatha S. W2V and Glove Embedding based Sentiment Analysis of Text Messages, International Journal of Advanced Research in Science, Communication and Technology, 2022, Vol. 2(1), pp. 229–234. DOI: https://doi.org/10.48175/IJARSCT-7677.

Li S., Chen L., Song C., Liu X. Text Classification Based on Knowledge Graphs and Improved Attention Mechanism [Electronic resource]. Access mode: https://doi.org/10.48550/arXiv.2401.03591.

Metzner C., Gao S., Herrmannova D., Lima-Walton E. Attention Mechanisms in Clinical Text Classification: A Comparative Evaluation, IEEE Journal of Biomedical and Health Informatics, 2024, Vol. 28(4), pp. 2247–2258. DOI: https://doi.org/10.1109/JBHI.2024.3355951.

Vallebueno A., Handan-Nader C., Manning C. D., Ho D. E. Statistical Uncertainty in Word Embeddings: GloVe-V [Electronic resource]. Access mode: https://doi.org/10.48550/arXiv.2406.12165.

Jarrahi A., Mousa R., Safari L. SLCNN: Sentence-Level Convolutional Neural Network for Text Classification [Electronic resource]. Access mode: https://doi.org/10.48550/arXiv.2301.11696.

Zhang R., Lee H., Radev D.R. Dependency Sensitive Convolutional Neural Networks for Modeling Sentences and Documents [Electronic resource]. Access mode: https://doi.org/10.18653/v1/N16-1177.

Kang L., He S., Long F., Wang M. Bilingual attention based neural machine translation, Applied Intelligence, 2022, Vol. 53(4), pp. 4302–4315. DOI: https://doi.org/10.1007/s10489-022-0356.

Terven J., Cordova-Esparza D. M., Ramirez-Pedraza A., Chavez-Urbiola E.A. Loss Functions and Metrics in Deep Learning [Electronic resource]. Access mode: https://doi.org/10.48550/arXiv.2307.02694.

Kozlenko M., Vialkova V. Software Defined Demodulation of Multiple Frequency Shift Keying with Dense Neural Network for Weak Signal Communications, IEEE 15th International Conference on Advanced Trends in Radioelectronics, Telecommunications and Computer Engineering (TCSET). LvivSlavske, Ukraine, 2020, pp. 590–595. DOI: https://doi.org/10.1109/TCSET49122.2020.235501.

Dutschmann T.M., Kinzel L., ter Laak A., et al. Large-scale evaluation of k-fold cross-validation ensembles for uncertainty estimation, Journal of Cheminformatics, 2023, Vol. 15(1), P. 49. DOI: https://doi.org/10.1186/s13321-023-00709-9.

Downloads

Published

2025-12-24

How to Cite

Kuz , M. V. ., Lazarovych, I. M., Kozlenko, M. I., Pikuliak, M. V., & Kvasniuk, A. D. (2025). RESEARCH ON A HYBRID LSTM-CNN-ATTENTION MODEL FOR TEXTBASED WEB CONTENT CLASSIFICATION. Radio Electronics, Computer Science, Control, (4), 105–115. https://doi.org/10.15588/1607-3274-2025-4-10

Issue

Section

Neuroinformatics and intelligent systems