MULTI-SCALE TEMPORAL GAN-BASED METHOD FOR HIGHRESOLUTION AND MOTION STABLE VIDEO ENHANCEMENT

M. R.  Maksymiv; T. Y. Rak

doi:10.15588/1607-3274-2025-3-9

Authors

M. R. Maksymiv Lviv Polytechnic National University, Lviv, Ukraine, Ukraine
T. Y. Rak Lviv Polytechnic National University, Lviv, Ukraine, Ukraine

DOI:

https://doi.org/10.15588/1607-3274-2025-3-9

Keywords:

video enhancement, deep neural networks, generative adversarial networks, multiscale smoothing, temporal discriminator, motion stabilization

Abstract

Context. The problem of improving the quality of video images is relevant in many areas, including video analytics, film production, telemedicine and surveillance systems. Traditional video processing methods often lead to loss of details, blurring and artifacts, especially when working with fast movements. The use of generative neural networks allows you to preserve textural features and improve the consistency between frames, however, existing methods have shortcomings in maintaining temporal stability and the quality of detail restoration.
Objective. The goal of the study is the process of generating and improving video images using deep generative neural networks. The purpose of the work is to develop and study MST-GAN (Multi-Scale Temporal GAN), which allows you to preserve both spatial and temporal consistency of the video, using multi-scale feature alignment, optical flow regularization and a temporal discriminator.
Method. A new method based on the GAN architecture is proposed, which includes: multi-scale feature alignment (MSFA), which corrects shifts between neighboring frames at different levels of detail; a residual feature boosting module to restore lost details after alignment; optical flow regularization, which minimizes sudden changes in motion and prevents artifacts; a temporal discriminator that learns to evaluate the sequence of frames, providing a consistent video without flickering and distortion.
Results. An experimental study of the proposed method was conducted on a set of different data and compared with other modern analogues by the metrics SSIM, PSNR and LPIPS. As a result, values were obtained that show that the proposed method outperforms existing methods, providing better frame detail and more stable transitions between them.
Conclusions. The proposed method provides improved video quality by combining detail recovery accuracy and temporal frame consistency

Author Biographies

M. R. Maksymiv, Lviv Polytechnic National University, Lviv, Ukraine

Postgraduate student, Assistant of the Department of Electronic Computing Machines

T. Y. Rak, Lviv Polytechnic National University, Lviv, Ukraine

Dr. Sc., Associate Professor, Professor at IT STEP University, and Professor of the Department of
Electronic Computing Machines

References

Sun D., Roth S., Black M. J. A quantitative analysis of current practices in optical flow estimation and the principles behind them, International Journal of Computer Vision (IJCV), 2014, Vol. 106, No. 2, pp. 115–137. DOI: 10.1007/s11263-013-0644-x.

Maksymiv M., Rak T. Method of Video Quality-Improving, Artificial Intelligence, 2023, Vol. 28, No. 3, pp. 47–62. DOI: 10.15407/jai2023.03.047.

Chu M., Xie Y., Mayer J., Dai B., Liu X. Learning Temporal Coherence via Self-Supervision for GAN-Based Video Generation, ACM Transactions on Graphics (TOG), 2020, Vol. 39, No. 4, P. 75. DOI: 10.1145/3386569.3392481.

Wang X., Chan K. C. K., Yu K., Dong C., Loy C. C. EDVR: Video restoration with enhanced deformable convolutional networks, IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 16–20 June 2019 : proceedings. Los Alamitos, IEEE, 2019, pp. 1954–

DOI: 10.1109/CVPR.2019.00206.

Wang Z., Bovik A. C., Sheikh H. R., Simoncelli E. P. Image quality assessment: From error visibility to structural similarity, IEEE Transactions on Image Processing, 2004, Vol. 13, No. 4, pp. 600–612. DOI: 10.1109/TIP.2003.819861.

Dabov K., Foi A., Katkovnik V., Egiazarian K. Image denoising by sparse 3D transform-domain collaborative filtering, IEEE Transactions on Image Processing, 2007, Vol. 16, No. 8, pp. 2080–2095. DOI: 10.1109/TIP.2007.901238.

Lehtinen J., Munkberg J., Hasselgren J., Laine S., Karras T., Aittala M., Aila T. Noise2Noise: Learning image restoration without clean data, International Conference on Machine Learning, Stockholm, 10–15 July 2018, proceedings. Stockholm, PMLR, 2018, pp. 2965–2974. DOI: 10.48550/arXiv.1803.04189.

Shannon C. E. A mathematical theory of communication, Bell System Technical Journal, 1948, Vol. 27, No. 3, pp. 379–423. DOI: 10.1002/j.1538-7305.1948.tb01338.x.

Teed Z., Deng J. RAFT: Recurrent all-pairs field transforms for optical flow, European Conference on Computer Vision, Glasgow, 23–28 August 2020 : proceedings. Berlin, Springer, 2020, pp. 402–419. DOI: 10.1007/978-3-030- 58536-5_24.

Bao W., Lai W.-S., Ma C., Zhang X., Gao Z., Yang M.-H. Depth-aware video frame interpolation, IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 16– 20 June 2019 : proceedings. Los Alamitos, IEEE, 2019, pp. 3703–3712. DOI: 10.1109/CVPR.2019.00382.

Maksymiv M., Tymchenko O. Research on methods of image resolution increase, Science and Technology Today, 2024, Vol. 12, No. 40, pp. 1497–1508. DOI: 10.52058/2786- 6025-2024-12(40)-1497-1508.

Dong C., Loy C. C., He K., Tang X. Image super-resolution using deep CNNs, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2015, Vol. 38, No. 2, pp. 295–307. DOI: 10.1109/TPAMI.2015.2439281.

Shi W. Caballero J., Huszár F., Totz J., Aitken A. P., Bishop R., Wang Z. Real-time video super-resolution using an efficient sub-pixel convolutional network, IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 26 June – 1 July 2016 : proceedings. Los

Alamitos, IEEE, 2016, pp. 1874–1883. DOI: 10.1109/CVPR.2016.207.

Jo Y., Oh T. W., Kang J., Kim S. J. Deep video superresolution using dynamic upsampling filters, IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18–23 June 2018 : proceedings. Los Alamitos, IEEE, 2018, pp. 3224–3232. DOI: 10.1109/CVPR.2018.00340.

Haris M., Shakhnarovich G., Ukita N. Recurrent backprojection network for video super-resolution, IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 16–20 June 2019 : proceedings. Los Alamitos, IEEE, 2019, pp. 3892–3901. DOI: 10.1109/CVPR.2019.00402.

Yoon S., Lee J., Kang S. TimeWarpGAN: A Temporal Consistency Framework for Video Enhancement / S. Yoon, // IEEE Transactions on Neural Networks and Learning Systems, 2021, Vol. 32, No. 6, pp. 2550–2562. DOI: 10.1109/TNNLS.2021.3067752.

Simonyan K., Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition, International Conference on Learning Representations (ICLR) [Electronic resource], 2015. Access mode: https://arxiv.org/abs/1409.1556.

He K., Zhang X., Ren S., Sun J. Deep Residual Learning for Image Recognition, IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 26 June – 1 July 2016 : proceedings. Los Alamitos, IEEE, 2016, pp. 770–778. DOI: 10.1109/CVPR.2016.90.

Nah S., Baik S., Hong S., Moon G., Son S., Timofte R., Lee K. M. NTIRE 2019 Challenge on Video Deblurring and Super-Resolution: Dataset and Study, IEEE Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, 16–20 June 2019 : proceedings. Los Alamitos, IEEE, 2019, pp. 0–0. DOI: 10.1109/CVPRW.2019.00009.

Xue T., Chen B., Wu J., Wei D., Freeman W. T. Video Enhancement with Task-Oriented Flow, International Journal of Computer Vision (IJCV), 2019, Vol. 127, pp. 1106–1125. DOI: 10.1007/s11263-018-1123-3.

Perazzi F., Pont-Tuset J., McWilliams B., Van Gool L., Gross M., Sorkine-Hornung A. A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation, IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 26 June – 1 July 2016 : proceedings. Los Alamitos, IEEE, 2016, pp. 724–732. DOI: 10.1109/CVPR.2016.85.

He K., Zhang X., Ren S., Sun J. Deep Residual Learning for Image Recognition, IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 26 June – 1 July 2016 : proceedings. Los Alamitos, IEEE, 2016, pp. 770– 778. DOI: 10.1109/CVPR.2016.90.

Wang Z., Bovik A. C., Sheikh H. R., Simoncelli E. P. Image Quality Assessment: From Error Visibility to Structural Similarity, IEEE Transactions on Image Processing, 2004, Vol. 13, No. 4, pp. 600–612. DOI: 10.1109/TIP.2003.819861.

Horé A., Ziou D. Image Quality Metrics: PSNR vs. SSIM, International Conference on Pattern Recognition, Istanbul, 23–26 August 2010 : proceedings. Los Alamitos, IEEE, 2010, pp. 2366–2369. DOI: 10.1109/ICPR.2010.579.

Zhang R., Isola P., Efros A. A., Shechtman E., Wang O. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric, IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18–23 June 2018 : proceedings. Los Alamitos, IEEE, 2018, pp. 586–595. DOI: 10.1109/CVPR.2018.00068.

MULTI-SCALE TEMPORAL GAN-BASED METHOD FOR HIGHRESOLUTION AND MOTION STABLE VIDEO ENHANCEMENT

Authors

DOI:

Keywords:

Abstract

Author Biographies

M. R. Maksymiv, Lviv Polytechnic National University, Lviv, Ukraine

T. Y. Rak, Lviv Polytechnic National University, Lviv, Ukraine

References

Downloads

Published

How to Cite

Issue

Section

License

Creative Commons Licensing Notifications in the Copyright Notices

Information

Current Issue