DEEP REINFORCEMENT LEARNING WITH SPARSE DISTRIBUTED MEMORY FOR “WATER WORLD” PROBLEM SOLVING

M. A.  Novotarskyi; S. G.  Stirenko; Y. G.  Gordienko; V. A. Kuzmych

doi:10.15588/1607-3274-2021-1-14

Authors

DOI:

https://doi.org/10.15588/1607-3274-2021-1-14

Keywords:

Deep Reinforcement Learning, DQN-algorithm, Sparse Distributed Memory, “Water World” problem.

Abstract

Context. Machine learning is one of the actively developing areas of data processing. Reinforcement learning is a class of machine learning methods where the problem involves mapping the sequence of environmental states to agent’s actions. Significant progress in this area has been achieved using DQN-algorithms, which became one of the first classes of stable algorithms for learning using deep neural networks. The main disadvantage of this approach is the rapid growth of RAM in real-world tasks. The approach proposed in this paper can partially solve this problem.

Objective. The aim is to develop a method of forming the structure and nature of access to the sparse distributed memory with increased information content to improve reinforcement learning without additional memory.

Method. A method of forming the structure and modification of sparse distributed memory for storing previous transitions of the actor in the form of prototypes is proposed. The method allows increasing the informativeness of the stored data and, as a result, to improve the process of creating a model of the studied process by intensifying the learning of the deep neural network. Increasing the informativeness of the stored data is the result of this sequence of actions. First, we compare the new transition and the last saved transition. To perform this comparison, this method introduces a rate estimate for the distance between transitions. If the distance between the new transition and the last saved transition is smaller than the specified threshold, the new transition is written in place of the previous one without increasing the amount of memory. Otherwise, we create a new prototype in memory while deleting the prototype that has been stored in memory the longest.

Results. The work of the proposed method was studied during the solution of the popular “Water World” test problem. The results showed a 1.5-times increase in the actor’s survival time in a hostile environment. This result was achieved by increasing the informativeness of the stored data without increasing the amount of RAM.

Conclusions. The proposed method of forming and modifying the structure of sparse distributed memory allowed to increase the informativeness of the stored data. As a result of this approach, improved reinforcement learning parameters on the example of the “Water World” problem by increasing the accuracy of the model of the physical process represented by a deep neural network.

Author Biographies

M. A. Novotarskyi , National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine.

Dr.Sc, Professor of Department of Computer Engineering.

S. G. Stirenko, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine.

Dr.Sc, Professor, Head of Department of Computer Engineering.

Y. G. Gordienko , National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine.

Dr.Sc, Professor of Department of Computer Engineering.

V. A. Kuzmych , National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine.

Post-graduate student of the Department of Computer Engineering.

References

Sutton R. S., Barto A. G. Reinforcement Learning: An Introduction. Cambridge: The MIT Press, 2018, 548 p.

Puterman M. L. Markov decision processes: discrete stochastic dynamic programming. New Jersey, John Wiley & Sons, 2014, 684 p.

Zhao Q. Multi-Armed Bandits: Theory and Applications to Online Learning in Networks. NY, Morgan & Claypool, 2019, 147 p. DOI: 10.2200/S00941ED2V01Y201907CNT022.

Theodoridis S. Machine learning: a Bayesian and optimization perspective. Elsevier, 2020, 1160 p. DOI: 10.1016/C2019-0-037727.

Doucent A., de Freitas N., Gordon N. Sequential Monte Carlo methods in practice. NY, Springer, 2001, 616 p.

Hester T. TEXPLORE: Temporal Difference Reinforcement Learning for Robots and Time-Constrained Domains, NY, Springer, 2013, 179 p.

Sutton R. Learning to Predict by the Method of Temporal Differences, Machine Learning, 1988, Vol. 3, pp. 9–44. DOI: 10.1007/BF00115009.

Laureiro-Martinez D., Brusoni S., Canessa N., Zollo M. Understanding the exploration-exploitation dilemma: an fMRI study of attention control and decision-making performance, Strategic Management Journal, 2015, Vol. 36, pp. 319–338. DOI: 10.1002/smj.2221.

Rejeb L., Guessoum Z., Hallah R. M. An adaptive approach for the exploration-exploitation dilemma for learning agents. Berlin, Springer, 2005, pp. 316–325.

Mersmann O., Bischl B., Trautmann H., Preuss M., Weihs C., Rudolph G. Exploratory landscape analysis. Proceedings of the 13th annual conference on Genetic and evolutionary computation, 2011, pp. 829–836.

Melo F. S., Meyn S. P., Ribeiro M. I. An analysis of reinforcement learning with function approximation, Proceedings of the 25th international conference on Machine learning, 2008, pp. 664–671.

Kanerva P. Sparse distributed memory. Cambridge:The MIT Press, 1990, 155 p.

Wu Ch. Novel Function Approximation Techniques forLarge-scale Reinforcement Learning. PhD dissertation, 2010, 130 p.

Mnih V., Kavukcuoglu K., Silver D., Graves A., Antonoglou I., Wierstra D., Riedmiller M. A. Playing Atari with Deep Reinforcement Learning, arXiv: 1312.5602v1 [cs. LG], 2013, 9 p.

Ollero J., Child C.H.T. Performance Enhancement of Deep Reinforcement Learning Networks using Feature Extraction, Lecture Notes in Computer Science, 2018, Vol. 10878. pp. 208–218. DOI: 10.1007/978-3-319-92537-0_25.

Holcomb S. D. Porter W.K., Ault Sh. V., Mao G., Wang J. Overview on DeepMind and its AlphaGo Zero AI, Proceedings of 2018 International Conference on Big Data and Education, 2018, pp. 67– 71. DOI: 10.1145/3206157.3206174.

Sewak M. Deep Q Network (DQN), Double DQN, and Dueling DQN, Deep Reinforcement Learning. Springer, pp. 95–108. DOI: 10.1007/978-981-13-8285-7_8.

Gao J., Shen Y., Liu J., Ito M., Shiratori N. Adaptive traffic signal control : deep reinforcement learning algorithm with experience replay and target network, arXiv 1705.02755v1 [cs.N], 2017, 10 p.

DEEP REINFORCEMENT LEARNING WITH SPARSE DISTRIBUTED MEMORY FOR “WATER WORLD” PROBLEM SOLVING

Authors

DOI:

Keywords:

Abstract

Author Biographies

M. A. Novotarskyi , National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine.

S. G. Stirenko, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine.

Y. G. Gordienko , National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine.

V. A. Kuzmych , National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine.

References

Downloads

Published

How to Cite

Issue

Section

License

Creative Commons Licensing Notifications in the Copyright Notices

Information

Current Issue

Announcements