DEEP REINFORCEMENT LEARNING OPTIMIZATION METHODS FOR TRAFFIC LIGHTS AT SIGNALIZED INTERSECTIONS

Authors

  • N. I. Boyko Lviv Polytechnic National University, Lviv, Ukraine
  • Y. L. Mokryk Lviv Polytechnic National University, Lviv, Ukraine

DOI:

https://doi.org/10.15588/1607-3274-2025-4-21

Keywords:

reinforcement learning, signalized intersection, traffic control, proximal policy optimization, deep Q-learning

Abstract

Context. Intersections are the most critical areas of a road network, where the largest number of collisions and the longest waiting times are observed. The development of optimal methods for traffic light control at signalized intersections is necessary for improving the flow of traffic at existing urban intersections, reducing the chance of traffic collisions, the time it takes to cross the intersection, and increasing the safety for drivers and pedestrians. Developing such an algorithm requires simulating and comparing the work of different approaches in a simulated environment.
Objective. The aim of this study is to develop an effective deep reinforcement-learning model aimed at optimizing traffic light
control at intersections.
Method. A custom simulation environment is designed, which is compatible with the OpenAI Gym framework, and two types of algorithms are compared: Deep Q-Networks and Proximal Policy Optimization. The algorithms are tested on a range of scenarios, involving ones with continuous and discrete action spaces, where the set of actions the agent may take are represented either by different states of the traffic lights, or by the length of traffic light signal phases. During training, various hyperparameters were also tuned, and different reward metrics were considered for the models: average wait time and average queue length. The developed environment rewards the agent during training according to one of the metrics chosen, while also penalizing it for any traffic rule violations.
Results. A detailed analysis of the test results of deep Q network and Proximal Policy Optimization algorithms was provided. In general, the Proximal Policy Optimization algorithms show more consistent improvement during training, while deep Q network algorithms suffer more from the problem of catastrophic forgetting. Changing the reward function allows the algorithms to minimize different metrics during training. The developed simulation environment can be used in the future for testing other types of algorithms on the same task, and it is much less computationally expensive compared to existing solutions. The results underline the need to study other methods of traffic light control that may be integrated with real-life traffic light systems for a more optimal and safer traffic flow.
Conclusions. The study has provided a valuable comparison of different methods of traffic light control in a signalized urban intersection, tested different ways of rewarding models during training and reviewed the effects this has on the traffic flow. The developed environment was sufficiently simple for the purposes of the research, which is valuable due to the large computational requirements of the models themselves, but can be improved in the future by expanding it with more complex simulation features, such as various types of intersections that aren’t urban, creating a road network of intersections that would all be connected to each other, adding pedestrian crossings, etc. Future work may be done to refine the simulation environment, expand the range of considered algorithms, consider the use of models for vehicle control in addition to traffic light control.

Author Biographies

N. I. Boyko, Lviv Polytechnic National University, Lviv

PhD, Associate Professor, Associate Professor of the Department of Artificial Intelligence Systems

Y. L. Mokryk, Lviv Polytechnic National University, Lviv

Post-graduate student, Department of Artificial Intelligence Systems

References

Tran Q.-D., Bae S.-H. An Efficiency Enhancing Methodology for Multiple Autonomous Vehicles in an Urban Network Adopting Deep Reinforcement Learning, Appl. Sci., 2021, Vol. 11(4), Р. 1514, Mode of access: https://doi.org/10.3390/app11041514

Vyklyuk Ya., Nevinskyi D., Boyko N. GeoCity – a New Dynamic-Spatial Model of Urban Ecosystem, J. Geogr. Inst. Cvijic, 2023, Vol. 73(2), P. 187–203, Mode of access: https://doi.org/10.2298/IJGI2302187V.

Team C. CARLA Simulator [Electronic resource]. Access mode: URL: https://carla.org/ (access date: 01.04.2025). Title from the screen.

Gutiérrez-Moreno R., Barea R., López-Guillén E., Araluce J., Bergasa L.M. Reinforcement Learning-Based Autonomous Driving at Intersections in CARLA Simulator, Sensors, 2022, Vol. 22(21), P. 8373, Mode of access: https://doi.org/10.3390/s22218373.

Činčurak D., Grbić R., Vranješ M., Vranješ D. Autonomous Vehicle Control in CARLA Simulator Using Reinforcement Learning, Int. Symp. ELMAR, 2024, pp. 311–316, Mode of access: https://doi.org/10.1109/ELMAR62909.2024.10694632.

Wu C., Parvate K., Vinitsky E., Bayen A. M. Flow: A Modular Learning Framework for Mixed Autonomy Traffic, IEEE Trans. Robot., 2022, Vol. 38.2, pp. 1270–1286.

Eclipse SUMO – Simulation of Urban Mobility [Electronic resource]. Access mode: URL: https://eclipse.dev/sumo/ (access date: 01.04.2025). Title from the screen.

Kothari P., Perone C., Bergamini L., Alahi A., Ondruska P. DriverGym: Democratising Reinforcement Learning for Autonomous Driving, ArXiv, 2021, Mode of access: https://doi.org/10.48550/arXiv.2111.06889.

OpenAI, Gym Beta [Electronic resource]. Access mode: URL: https://openai.com/research/openai-gym-beta (access date: 05.04.2025). Title from the screen.

Wang B., He Z., Sheng J., Chen Y. Deep Reinforcement Learning for Traffic Light Timing Optimization, Processes 2022, Vol. 10(11), P. 2458, Mode of access: https://doi.org/10.3390/pr10112458.

Park S., Han E., Park S., Jeong H., Yun I. Deep Q-networkbased traffic signal control models, PLOS ONE, 2021, Vol. 16.9, Mode of access:

https://doi.org/10.1371/journal.pone.0256405

Shi Y., Liu Y., Qi Y., Han Q. A Control Method with Reinforcement Learning for Urban Un-Signalized Intersection in Hybrid Traffic Environment, Sensors, 2022, Vol. 22, Mode of access: https://doi.org/doi:10.3390/s22030779.

Sutton R. S., Barto A. G. Reinforcement Learning: an Introduction, second edition: MIT Press, 2018, Р. 552.

Schulman J., Wolski F., Dhariwal P., Radford A., Klimov O. Proximal Policy Optimization Algorithms, ArXiv, 2017, Mode of access: https://arxiv.org/abs/1707.06347.

OpenAI, Proximal Policy Optimization [Electronic resource]. Access mode: URL: https://openai.com/research/openai-baselines-ppo (access date: 11.04.2025). Title from the screen.

Downloads

Published

2025-12-24

How to Cite

Boyko, N. I. ., & Mokryk, Y. L. . (2025). DEEP REINFORCEMENT LEARNING OPTIMIZATION METHODS FOR TRAFFIC LIGHTS AT SIGNALIZED INTERSECTIONS. Radio Electronics, Computer Science, Control, (4), 233–245. https://doi.org/10.15588/1607-3274-2025-4-21

Issue

Section

Control in technical systems