SIMPLE, FAST AND SCALABLE RECOMMENDATION SYSTEMS VIA EXTERNAL KNOWLEDGE DISTILLATION

D. V. Androsov; N. I.  Nedashkovskaya

doi:10.15588/1607-3274-2025-3-12

Authors

D. V. Androsov National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine, Ukraine
N. I. Nedashkovskaya National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv Ukraine, Ukraine

DOI:

https://doi.org/10.15588/1607-3274-2025-3-12

Keywords:

knowledge distillation, knowledge graphs, decoder-only models, node embeddings, transformer models, attention mechanism, recurrent neural networks, long short-term memory networks, deep neural networks, personalized sequential recommendations, predicting the next most relevant product, user modeling

Abstract

Context. Recommendation systems are important tools for modern businesses to generate more income via proposing relevant goods to clients and achieve more loyal attendees. With deep learning emergence and hardware capabilities evolution it became possible to grasp customer behavioral patterns in data-driven way. However, accuracy of prediction is dependent on complexity of system, and these factors lead to increased delay in model’s output. The object of the study is the task of issuing sequential recommendations, namely the next most relevant product, subject to restrictions on system response time.
Objective. The goal of the research is the synthesis of a deep neural network that can retrieve relevant items for a large portion of users with minimal delay.
Method. The proposed method of obtaining recommendation systems that leverages a mixture of Attention-based deep learning model architectures with application of knowledge graphs for prediction quality enhancement via explicit enrichment of recommendation candidate pool, demonstrates the benefits of decoder-only models and distillation learning framework. The latter approach was proven to demonstrate outstanding performance in solving recommendation retrieval task while responding fast for large user batch processing.
Results. A model of a recommender system and a method for its training are proposed, combining the knowledge distillation
paradigm and learning on knowledge graphs. The proposed method was implemented via two-tower deep neural network to solve recommendation retrieval problem. A system for predicting the most relevant proposals for the user has been built, which includes the proposed model and its training method, as well as ranking indicators MAP@k and NDCG@k to assess the quality of the models. A program has been developed that implements the proposed architecture of the recommendation system, with the help of which the problem of issuing the most relevant proposals has been studied. When conducting experiments on a large amount of real data from user visits to an online retail store, it was found that the proposed method for designing recommender systems guarantees high relevance of the recommendations issued, is fast and unpretentious to computing resources at the stage of receiving responses from the system.
Conclusions. Series of conducted experiments confirmed that the proposed system effectively solves the problem in a short period of time, which is a strong argument in favor of its use in real conditions for large businesses that operate millions of visits per month and thousands of products. Prospects for further research within the given research topic include the use of other knowledge distillation methods, such as internal or self-distillation, the use of deep learning architectures other than the attention mechanism, and optimization of embedding vector storage

Author Biographies

D. V. Androsov, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine

Post-graduate student, Institute for Applied System Analysis

N. I. Nedashkovskaya, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv Ukraine

Dr. Sc., Professor, Department of Mathematical Methods of System Analysis, Institute for
Applied Systems Analysis, Associate Professor

References

Falk K. Practical Recommender Systems. Shelter Island, Manning, 2019, 432 p.

Rasool A. Next Best Offer (NBO) / Next Best Action (NBA) – why it requires a fresh perspective? [Electronic resource]. Access mode: https://www.linkedin.com/pulse/next-best-offer-nbo-actionnba-why-requires-fresh-azaz-rasool/

Wang S., Wang Y., Hu L. et al. Modeling User Demand Evolution for Next-Basket Prediction, IEEE Transactions on Knowledge and Data Engineering, 2023, Vol. 35, Issue 11, pp. 11585–11598. DOI: 10.1109/TKDE.2022.3231018.

Eliyahu K. A. Achieving Commercial Excellence through Next Best Offer models. [Electronic resource]. Access mode: https://www.linkedin.com/pulse/achievingcommercial-excellence-through-next-best-offer-kisliuk/

Wang S., Hu L., Wang Y. et al. Sequential Recommender Systems: Challenges, Progress and Prospects, International Joint Conference on Artificial Intelligence : Twenty-eighth international joint conference, IJCAI 2019, Macao, 10–16 August 2019 : proceedings. Macao: International Joint Conference on Artificial Intelligence, 2019, pp. 6332–6338. DOI: 10.24963/ijcai.2019/883.

Garcin F., Dimitrakakis C., Faltings B. Personalized News Recommendation with Context Trees, Recommender systems : Seventh ACM conference, RecSys'13, Hong-Kong, 12–16 October 2013 : proceedings. New York, Association for Computing Machinery, 2013, pp. 105–112. DOI: 10.1145/2507157.2507166.

He R., McAuley J. Fusing Similarity Models with Markov Chains for Sparse Sequential Recommendation, ArXiv, 2016. DOI: 1609.09152v1.

Geron A. Hands-On Machine Learning with Scikit-Learn and TensorFlow. Sebastopol, O’Reilly Media Inc., 2017, 760 p.

Hochreiter S., Schmidhuber J. Long short-term memory, Neural computation, 1997, Vol. 9, № 8, pp. 1735–1780.

Xia Q., Jiang P., Sun F. et al. Modeling Consumer Buying Decision for Recommendation Based on Multi-Task Deep Learning, Information and Knowledge Management : Twenty-seventh ACM international conference, CIKM '18, Torino, 22–26 October 2018 : proceedings. New York, Association for Computing Machinery, 2018, pp. 1703– 1706. DOI: 10.1145/3269206.3269285.

Zhao C., You J., Wen X., Li X. Deep Bi-LSTM Networks for Sequential Recommendation, Entropy (Basel), 2020, Vol. 22, Issue 8, P. 870. DOI: 10.3390/e22080870.

Vaswani A., Shazeer N., Parmar N. et al. Attention is all you need, Neural Information Processing Systems : Thirtyfirst international conference, NIPS '17, Long Beach, California, 04–09 December 2017 : proceedings. New York: Curran Associates Inc., 2017, pp. 6000–6010.

Ying H., Zhuang F., Zhang F. et al. Sequential Recommender System based on Hierarchical Attention Network, International Joint Conference on Artificial Intelligence : Twenty-seventh international joint conference, IJCAI '18, Stockholm, 13–19 July 2018 : proceedings. Menlo Park, AAAI Press, 2018, pp. 3926– 3932. DOI: 10.24963/ijcai.2018/546.

Fan Z., Liu Z., Wang Y. et al. Sequential Recommendation via Stochastic Self-Attention, ACM Web Conference 2022, WWW '22, Lyon, 25–29 April 2022 : proceedings. New York, Association for Computing Machinery, 2022, pp. 2036–2047. DOI: 10.1145/3485447.3512077.

Kipf T. N., Welling M. Semi-Supervised Classification with Graph Convolutional Networks, International Conference on Learning Representations : Fifth international conference, ICLR 2017, Toulon, 24–26 April 2017 : proceedings. New York, Curran Associates Inc., 2017. DOI: 10.48550/arXiv.1609.02907.

Wu Z., Pan S., Chen F. et al. A Comprehensive Survey of Graph Neural Networks for Knowledge Graphs, IEEE Transactions on Neural Networks and Learning Systems, 2022, Vol. 32, № 1, pp. 4–24. DOI: 10.1109/TNNLS.2020.2978386.

Hekmatfar T., Haratizadeh S., Razban P., Goliaei S.]Attention-Based Recommendation On Graphs, ArXiv, 2022. DOI: 2201.05499.

Hinton G., Vinyals O., Dean J. Distilling the knowledge in a neural network, ArXiv, 2015. DOI: 1503.02531.

Ba L. J., Caruana R. Do Deep Nets Really Need to be Deep? Advances in Neural Information Processing Systems, 2014, Vol. 27, pp. 2654–2662. DOI: 1312.6184.

Church K. W., Hanks P. Word association norms, mutual information, and lexicography, Computational Linguistics, 1990, Vol. 16, № 1, pp. 22–29.

Grover A., Leskovec J. Node2vec: Scalable Feature Learning for Networks, ArXiv, 2016. DOI: 1607.00653.

SIMPLE, FAST AND SCALABLE RECOMMENDATION SYSTEMS VIA EXTERNAL KNOWLEDGE DISTILLATION

Authors

DOI:

Keywords:

Abstract

Author Biographies

D. V. Androsov, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine

N. I. Nedashkovskaya, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv Ukraine

References

Downloads

Published

How to Cite

Issue

Section

License

Creative Commons Licensing Notifications in the Copyright Notices

Information

Current Issue