DEEP LEARNING MODELS FOR PREDICTING HUMAN MOVEMENT IN VIDEO STREAMS

N. V. Bilous; V. O. Ivanichev

doi:10.15588/1607-3274-2026-1-3

Authors

N. V. Bilous Kharkiv National University of Radio Electronics, Kharkiv, Ukraine
V. O. Ivanichev Харківський національний університет радіоелектроніки, Харків, Ukraine

DOI:

https://doi.org/10.15588/1607-3274-2026-1-3

Keywords:

deep learning, object detection, motion trajectory, human trajectory prediction, video streams, graph neural networks, context-aware motion prediction, Stanford Drone Dataset, real-time inference

Abstract

Context. The problem of accurately predicting human movement in an environment is critical for applications in monitoring, search, and navigation systems. Existing approaches often struggle to integrate spatial and temporal dynamics of trajectories while processing real-time video streams.
Objective. The goal of this work is to develop a deep learning-based framework capable of predicting human motion by combining object-level features and spatio-temporal trajectory information extracted from video streams.
Method. The proposed method integrates YOLO11 for object detection, which extracts coordinates, velocity, movement direction, and position relative to the environment. A graph neural network models local and global relationships between environment nodes, aggregating features while considering terrain structure and obstacles. Spatio-temporal attention highlights the most relevant moments in the trajectory, enhancing prediction accuracy. The model processes sequences of frames from video streams to predict subsequent positions of each tracked object in real time.
Results. Experiments on video sequences with varying motion scenarios, trajectory lengths, and speed variations demonstrated high prediction accuracy. The proposed method effectively integrates spatial and temporal features, outperforming baseline models in tracking and motion prediction tasks.
Conclusions. The results confirm that the proposed deep learning framework is suitable for real-time human motion prediction in complex environments. Future research may focus on extending the approach to multi-agent scenarios, optimizing computational performance, and testing on larger and more diverse datasets

Author Biographies

N. V. Bilous, Kharkiv National University of Radio Electronics, Kharkiv

PhD, Professor

V. O. Ivanichev, Харківський національний університет радіоелектроніки, Харків

Post-graduate student of the Software Engineering Department

References

Redmon J., Divvala S., Girshick R., Farhadi A. You Only Look Once: Unified, Real-Time Object Detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, Nevada, USA, June 27–30, 2016, 2016, pp. 779–788. DOI: 10.48550/arXiv.1506.02640.

Simonyan K., Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition, International Conference on Learning Representations, 2015, pp. 1–14. DOI: 10.48550/arXiv.1409.1556.

Bilous N., Malko V., Frohme M., Nechyporenko A. Comparison of CNN-Based Architectures for Detection of Different Object Classes, Artificial Intelligence, 2024, Vol. 5, No. 4, pp. 2300–2320. DOI: 10.3390/ai5040113.

Zhao Y., Lv W., Xu S., Wei J., Wang G., Dang Q., Liu Y., Chen J. DETRs Beat YOLOs on Real-time Object Detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, Washington, USA, June 17–22, 2024, 2024, pp. 16965–16974. DOI: 10.48550/arXiv.2304.08069.

Redmon J., Farhadi A. YOLO9000: Better, Faster, Stronger, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, USA, July 21–26, 2017, 2017, pp. 6517–6525. DOI: 10.1109/CVPR.2017.690.

Li Y., Huang Y., Tao Q.Improving real-time object detection in Internet-of-Things smart city traffic with YOLOv8-DSAF method, Scientific Reports, 2024, Vol. 14, Article number: 17235, 15 p. DOI: 10.1038/s41598-024-68115-1.

Wang C.-Y., Bochkovskiy A., Liao H.-Y.M.Scaled-YOLOv4: Scaling Cross Stage Partial Network, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, Tennessee, USA, June 20–25, 2021, 2021, pp. 13029–13038. DOI: 10.1109/CVPR46437.2021.01283.

Jocher G., Chaurasia A., Qiu J. YOLOv8 : Overview, Ultralytics Documentation, 2023. DOI: 10.5281/zenodo.3908559.

Bilous N., Malko V., Moshenskyi N. Search and Detection of People in the Water Using YOLO Architectures: A Comparative Analysis from YOLOv3 to YOLOv8, Automation 2024: Advances in Automation, Robotics and Measurement Techniques. AUTOMATION 2024. Lecture Notes in Networks and Systems, Vol. 1219. Springer, Cham. pp. 233–255. DOI: 10.1007/978-3-031-78266-4_21

Bilous N., Svidin O., Ahekian I., Malko V. A skeleton-based method for exercise recognition based on 3D coordinates of human joints, IAES International Journal of Artificial Intelligence (IJ–AI), ISSN/e-ISSN 2089-4872/2252-8938, 2024. pp. 1805–1816. DOI: 10.11591/ijai.v13.i2.pp1805-1816

Bilous N., Ahekian I., Kaluhin V. Determination and Comparison Methods of Body Positions on Stream Video, Radio Electronics, Computer Science, Control, 2023, № 2, pp. 52–60. DOI: 10.15588/1607-3274-2023-2-6

Cao Z., Hidalgo G., Simon T., Wei S., Sheikh Y. OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, Vol. 43, № 1, pp. 172–186. DOI: 10.1109/TPAMI.2019.2929257.

Pavllo D. Feichtenhofer C., Grangier D., Auli M. 3D human pose estimation in video with temporal convolutions and semisupervised training, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, California, USA, June 16–20, 2019, 2019, pp. 7753–7762. DOI: 10.48550/arXiv. 1811.11742.

Alahi A., Goel K., Ramanathan V., Robicquet A., Fei-Fei L., Savarese S. Social LSTM: Human Trajectory Prediction in Crowded Spaces, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, USA, June 27–30, 2016, 2016, pp. 961–971. DOI: 10.1109/CVPR.2016.110.

Veličković P., Cucurull G., Casanova A., Romero A., Lio P., Bengio Y. Graph Attention Networks, Proceedings of the International Conference on Learning Representations, Vancouver, Canada, April 30 – May 3, 2018, 2017. DOI: 10.17863/CAM.48429.

Yu B., Yin H., Zhu Z. Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting, Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Stockholm, Sweden, July 13–19, 2018, 2018, pp. 3634–3640. DOI: 10.24963/ijcai.2018/505.

Mi J., Zhang X., Zeng H., Wang L. DERGCN: DynamicEvolving graph convolutional networks for human trajectory prediction, Neurocomputing, 2024, Vol. 569, Article 127117. DOI: 10.1016/j.neucom.2023.127117.

Huang F., Fan Z., Li X., Zhang W., Li P., Geng Y., Zhu K. Tailored meta-learning for dual trajectory transformer: advancing generalized trajectory prediction, Complex & Intelligent Systems, 2025, Vol. 11, Article no. 174. DOI: 10.1007/s40747-025-01802-2.

Chen S. et al. Adaptive Graph Transformer for Human Trajectory Prediction, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, Washington, USA, June 17–22, 2024, 2024, pp. 1617–1628.

Gupta A., Johnson J., Fei-Fei L., Savarese S., Alahi A. Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Utah,

USA, June 18–22, 2018, 2018, pp. 2255–2264. DOI: 10.1109/CVPR.2018.00240.

Sadeghian A., Kosaraju V., Sadeghian A., Hirose N., Rezatofighi H., Savarese S. SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, California, USA, June 16–20, 2019, 2019, pp. 1349–1358. DOI: 10.1109/CVPR.2019.00144.

Bilous N., Kozhevnikov A. Research of Methods for Determining the Accuracy of Metrological Measurements, Technology Audit and Production Reserves, 3(2(65)), 2022, pp. 18–23. DOI: 10.15588/1607-3274-2022-2-3

Bilous N., Tereshchenko I., Tereshchenko A., Bilous N., Shtangey S., Warsza Z. Risk Analysis Method by the Extreme Data of Dependent Exogenous Variables, Journal of Automation, Mobile Robotics and Intelligent Systems, 2022, pp. 44–53. DOI: 10.14313/JAMRIS/3-2021/18

Kosaraju V., Martin-Martin R., Reid I., Rezatofighi S., Savarese

S. Social-BiGAT: Multimodal Trajectory Forecasting using Bicycle-GAN and Graph Attention Networks, Proceedings of the Advances in Neural Information Processing Systems, Vancouver, Canada, December 8–14, 2019. 2019, pp. 137–146. DOI: 10.5555/3454287.3454300.

Ruochen Li, Tanqiu Q., Stamos K., Zhanxing Z., Hubert S. Unified Spatial-Temporal Edge-Enhanced Graph Networks for Pedestrian Trajectory Prediction, IEEE Transactions on Circuits and Systems for Video Technology, 2025, pp. 1–14. DOI: 10.1109/TCSVT.2025.3539522.

Stanford Drone dataset, 2016, https://cvgl.stanford.edu/ projects/uav_data/

DEEP LEARNING MODELS FOR PREDICTING HUMAN MOVEMENT IN VIDEO STREAMS

Authors

DOI:

Keywords:

Abstract

Author Biographies

N. V. Bilous, Kharkiv National University of Radio Electronics, Kharkiv

V. O. Ivanichev, Харківський національний університет радіоелектроніки, Харків

References

Downloads

Published

How to Cite

Issue

Section

License

Creative Commons Licensing Notifications in the Copyright Notices

Information

Current Issue

Announcements