RESOURCE-EFFICIENT, ROBUST, AND ADAPTIVE OBJECT DETECTION IN UAV IMAGERY

V. V.  Moskalenko; A. S.  Moskalenko; Y. V. Moskalenko; A. V. Vatsenko

doi:10.15588/1607-3274-2026-2-8

Authors

V. V. Moskalenko Sumy State University, Sumy, Ukraine
A. S. Moskalenko Sumy State University, Sumy, Ukraine
Y. V. Moskalenko Sumy State University, Sumy, Ukraine
A. V. Vatsenko Sumy State University, Sumy, Ukraine

DOI:

https://doi.org/10.15588/1607-3274-2026-2-8

Keywords:

object detection, robustness, adaptability, adversarial procedural noise, dynamic neural network

Abstract

Context. Ensuring robust, adaptable, and compute-efficient object detection in UAV aerial imagery under distribution shifts, structured and unstructured noise, and strict onboard latency/energy budgets is an urgent scientific task. A compute-aware detector and a complementary training/adaptation method that integrate a dynamic transformer backbone with gate units, parameter-efficient adapters, and resource-bounded test-time adaptation to sustain accuracy under realistic perturbations and domain shift.
Objective. Development of a model and method for object detection in aerial imagery that jointly provide robustness and adaptability while meeting embedded compute and real-time constraints typical of onboard UAV systems.
Methods. The approach combines dynamic neural networks with Gumbel-Softmax gate units over a ViT-T/16 backbone, a Simple FPN and a RetinaNet-like one-stage head, budget-aware losses that target a desired dynamic compression rate, structured procedural noise (Perlin, Gabor, Worley) for robustness training, LeakyReLU6 with a straight-through estimator for stable gradients, and test-time adaptation via objectness-weighted marginal-entropy minimization on lightweight adapters.
Results. On VEDAI, a gated ViT-T/16 detector reaches mAP@0.5 of 0.77 at ~5.0 GFLOPs and 17.8 FPS, rising to 0.79 with adapters and TTA, whereas a static counterpart attains 0.74 at 9.6 GFLOPs and 10.6 FPS; pretraining with procedural noise lifts accuracy further to 0.80 (gated) and 0.82 (gated+TTA) with minimal compute overhead. Under domain shift (trained on VisDrone, evaluated on VEDAI), dynamic gating and TTA improve mAP from 0.54 to 0.60 without noise pretraining and up to 0.66 with it, sustaining ~5.4–5.6 GFLOPs and ~16–17 FPS within an 8–10 GFLOPs budget on 4×A76 CPUs.
Conclusions. The proposed object detection model and method – combining dynamic gating, perturbation-aware training, and
budgeted test-time adaptation – reduce average compute while increasing robustness and adaptability, yielding a superior accuracythroughput trade-off for UAV onboard deployment under real-world disturbances and distribution shifts.

Author Biographies

V. V. Moskalenko, Sumy State University, Sumy

PhD, Associate Professor, Associate Professor of Computer Science department

A. S. Moskalenko, Sumy State University, Sumy

PhD, Associate Professor, Associate Professor of Computer Science department

Y. V. Moskalenko, Sumy State University, Sumy

Post-graduate student

A. V. Vatsenko, Sumy State University, Sumy

Post-graduate student

References

Tang G., Ni J., Zhao Y., Gu Y., Cao W.A survey of object detection for UAVs based on deep learning. Remote Sensing, 2023, Vol. 16, № 1, P. 149. DOI: 10.3390/rs16010149.

Wei H., Wang Z., Ni Y. Hierarchical mixed-precision posttraining quantization for SAR ship detection networks. Remote Sensing, 2024, Vol. 16, № 21, P. 4042. DOI: 10.3390/rs16214042.

Hendrycks D., Dietterich T. Benchmarking neural network robustness to common corruptions and perturbations [Electronic resource], 2019. Access mode: https://arxiv.org/abs/1903.12261. DOI: 10.48550/arXiv.1903.12261.

He Y., Meng G., Chen K., Hu X., He J. Towards security threats of deep learning systems: a survey. IEEE Transactions on Software Engineering, 2020, Vol. 48, № 5, pp. 1743 – 1770. DOI: 10.1109/TSE.2020.3034721.

Arkin E., Yadikar N., Xu X., Aysa A., Ubul K. A survey: object detection methods from CNN to transformer. Multimedia Tools and Applications, 2023, Vol. 82, № 14, pp. 21353–21383. DOI: 10.1007/s11042-022-13801-3.

Rahmath P. H., Srivastava V., Chaurasia K., Pacheco R. G., Couto R. S. Early-Exit deep neural network – a comprehensive survey. ACM Computing Surveys, 2024, Vol. 57, № 3, pp. 1–37. DOI: 10.1145/3698767.

Sun Y., Sun Z., Chen W. The evolution of object detection methods, Engineering Applications of Artificial Intelligence, 2024, Vol. 133, № 108458. DOI: 10.1016/j.engappai.2024.108458.

Cao J., Peng B., Gao M., Hao H., Li X., Mou H. Object detection based on CNN and vision-transformer: a survey. IET Computer Vision, 2025, Vol. 19, № 1, pp. 1–30. DOI: 10.1049/cvi2.70028.

Papa L., Russo P., Amerini I., Zhou L. A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, Vol. 46, № 12, pp. 7682–7700. DOI: 10.1109/TPAMI.2024.3392941.

Ruan X., Tang W. Fully test time adaptation for object detection, 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, USA, 17–18 June 2024 : proceedings. Piscataway, IEEE, 2024, pp. 1038–1047. DOI: 10.1109/CVPRW63382.2024.00110.

Li Y., Fan Q., Huang H., Han Z., Gu Q. A modified YOLOv8 detection network for UAV aerial image recognition. Drones, 2023, Vol. 7, № 5. DOI: 10.3390/drones7050304.

Wu W., Liu A., Hu J., Mo Y., Xiang S., Duan P., Liang Q. EUAVDet: an efficient and lightweight object detector for UAV aerial images with an edge-based computing platform. Drones, 2024, Vol. 8, № 6, P. 261. DOI: 10.3390/drones8060261.

Lyu Z., Yu T., Pan F., Zhang Y., Luo J., Zhang D., Chen Y., Zhang B., Li G. A survey of model compression strategies for object detection. Multimedia Tools and Applications, 2023, Vol. 83, P. 48165–48236. DOI: 10.1007/s11042-023- 17192-x.

Ju W., Bao W., Ge L., Yuan D. Dynamic early exit scheduling for deep neural network inference through contextual bandits. CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, New York, NY, USA, 1–5 November 2021 : proceedings. New York, ACM, 2021, pp. 823–832. DOI: 10.1145/3459637.3482335.

Yin H., Vahdat A., Alvarez, J. M. Mallya A., Kautz J., Molchanov P. A-ViT: adaptive tokens for efficient vision transformer. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022 : proceedings. Piscataway, IEEE,

DOI: 10.48550/arXiv.2112.07658.

Li Y., Xie B., Guo S., Yang Y., Xiao B. A survey of robustness and safety of 2D and 3D deep learning models against adversarial attacks, ACM Computing Surveys, 2024, Vol. 56, № 6, pp. 1–37. DOI: 10.1145/3636551.

Awad Z., Zakaria M., Hassan R. An enhanced ensemble defense framework for boosting adversarial robustness of intrusion detection systems. Scientific Reports, 2025, Vol. 15, № 1, P. 94023. DOI: 10.1038/s41598-025-94023-z.

Chen Y., Shen Y., Duan C., Wang Z., Mo Z., Liang Y., Zhang Q. Robust and efficient SAR ship detection: an integrated despecking and detection framework. Remote Sensing, 2025, Vol. 17, № 4, P. 580. DOI: 10.3390/rs17040580.

Haque M., Yang W. Dynamic neural network is all you need: understanding the robustness of dynamic mechanisms in neural networks. 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Paris, France, 2–6 October 2023 : proceedings. Piscataway: IEEE, 2023. DOI: 10.1109/ICCVW60793.2023.00163.

Liu J., Jin Y. A comprehensive survey of robust deep learning in computer vision. Journal of Automation and Intelligence, 2023, Vol. 2, № 4, pp. 175–195. DOI: 10.1016/j.jai.2023.10.002.

Wang S., Veldhuis R., Brune C., Strisciuglio N. A survey on the robustness of computer vision models against common corruptions. [Electronic resource], 2023. Access mode: https://arxiv.org/abs/2305.06024.

Gharoun H., Momenifar F., Chen F., Gandomi A. H.Metalearning approaches for few-shot learning: a survey of recent advances. ACM Computing Surveys, 2024, Vol. 57, № 8. DOI: 10.1145/3659943.

Rao Y., Liu Z., Zhao W., Zhou J., Lu J. Dynamic spatial sparsification for efficient vision transformers and convolutional neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, pp. 1–14. DOI: 10.48550/arXiv.2106.02034.

Wang D., Shelhamer E., Liu S., Olshausen B., Darrell T. Tent: fully test-time adaptation by entropy minimization. [Electronic resource], 2021. Access mode: https://arxiv.org/abs/2006.10725. DOI: 10.48550/arXiv.2006.10725.

Maesumi A., Hu D., Saripalli K., Kim V. G., Fisher M., Pirk S., Ritchie D. One noise to rule them all: learning a unified model of spatially-varying noise patterns. ACM Transactions on Graphics, 2024, Vol. 43, № 4, pp. 1–21. DOI: 10.1145/3658195.

Touvron H., Cord M., Douze M., Massa F., Sablayrolles A., Jégou H. Training data-efficient image transformers & distillation through attention. [Electronic resource], 2020. Access mode: https://arxiv.org/abs/2012.12877. DOI: 10.48550/arXiv.2012.12877.

Scardapane S., Baiocchi A., Devoto A., Marsocci V., Minervini P., Pomponi J. Conditional computation in neural networks: principles and research trends. Intelligenza Artificiale, 2024, Vol. 18, № 1. DOI: 10.3233/IA-240035.

Meng L., Li H., Chen B., Lan S., Wu Z., Jiang Y., S. Lim AdaViT: adaptive vision transformers for efficient image recognition. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, LA, USA, 18–24 June 2022 : proceedings. Piscataway, IEEE, 2022. DOI: 10.1109/CVPR52688.2022.01199.

Chen S., Ge C., Tong Z., Wang J., Song Y., Wang J., Luo P. AdaptFormer: adapting vision transformers for scalable visual recognition. NeurIPS: Conference on Neural Information Processing Systems, 2022 : proceedings, 2022. DOI: 10.5555/3600270.3601482.

Li Y., Mao H., Girshick R., He K. Exploring plain vision transformer backbones for object detection. Lecture Notes in Computer Science. Cham, Springer, 2022, pp. 280–296. DOI: 10.1007/978-3-031-20077-9_17.

Sandler M., Howard A., Zhu M., Zhmoginov A., Chen L. MobileNetV2: inverted residuals and linear bottlenecks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018 : proceedings. Piscataway, IEEE, 2018. DOI: 10.1109/CVPR.2018.00474.

Assran M., Caron M., Misra I., Bojanowski P., Bordes F., Vincent P., Joulin A., Rabbat M., Ballas N. Masked siamese networks for label-efficient learning. Lecture Notes in Computer Science. Cham, Springer, 2022, pp. 456–473. DOI: 10.1007/978-3-031-19821-2_26.

Caron M., Touvron H., Misra I., Jégou H., Mairal J., Bojanowski P., Joulin A. Emerging properties in selfsupervised vision transformers. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021 : proceedings.

Piscataway, IEEE, 2021. DOI: 10.1109/ICCV48922.2021.00951.

Oquab M., Darcet T., Moutakanni T., Vo H., Szafraniec M., Khalidov V., Fernandez P., Haziza D., Massa F., El-Nouby A., Assran M., Ballas N., Galuba W., Howes R., Huang P., Li S., Misra I., Rabbat M., Sharma V., Synnaeve G., Xu H., Jegou H., Mairal J., Labatut P., Joulin A., Bojanowski P. DINOv2: learning robust visual features without supervision. [Electronic resource], 2023. Access mode: https://arxiv.org/abs/2304.07193.

Lagae A., Lefebvre S., Cook R., Derose T., Drettakis G., Ebert D. S., Lewis J. P., Perlin K., Zwicker M. A survey of procedural noise functions. Computer Graphics Forum, 2010, Vol. 29, № 8, pp. 2579–2600. DOI: 10.1111/j.1467-8659.2010.01827.x.

Zhang M., Levine S., Finn C. MEMO: test time robustness via adaptation and augmentation [Electronic resource], 2021. Access mode: https://arxiv.org/abs/2106.07596.

He K., Chen X., Xie S., Li Y., Dollar P., Girshick R. Masked autoencoders are scalable vision learners. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, 18–24 June 2022 :proceedings. Piscataway, NJ, IEEE, 2022. DOI: 10.1109/CVPR52688.2022.01553.

Tariq J., Kwong S., Yuan H. HEVC intra mode selection based on rate-distortion (RD) cost and sum of absolute difference (SAD). Journal of Visual Communication and Image Representation, 2016, Vol. 35, pp. 112–119. DOI: 10.1016/j.jvcir.2015.11.013.

Cao Y., He Z., Wang L., Wang W., Yuan Y., Zhang D., Zhang J., Zhu P., Gool L. V., Han J., Hoi S., Hu1 Q., Liu M., Cheng C., Liu F., Cao G., Li G., Wang H., He J., Wan J., Wan Q., Zhao Q., Lyu S., Zhao W., Lu X., Zhu X., Liu Y., Lv Y., Ma Y., Yang Y., Wang Z., Xu Z., Luo Z., Zhang Z., Zhang Z., Li Z., Zhang Z. VisDrone DET2021: the vision meets drone object detection challenge results. 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). Montreal, BC, Canada, 11–17 October 2021 : proceedings. Piscataway, NJ, IEEE, 2021.

DOI: 10.1109/ICCVW54120.2021.00319.

Leng J., Ye Y., Mo M., Gao C., Gan J., Xiao B., Gao X. Recent advances for aerial object detection: a survey. ACM Computing Surveys, 2024, Vol. 56, № 12. DOI: 10.1145/3664598.

Qiu Y., Zheng X., Hao X., Zhang G., Lei T., Jiang P. ARSOD-YOLO: enhancing small target detection for remote sensing images. Sensors, 2024, Vol. 24, № 23, P. 7472. DOI: 10.3390/s24237472.

RESOURCE-EFFICIENT, ROBUST, AND ADAPTIVE OBJECT DETECTION IN UAV IMAGERY

Authors

DOI:

Keywords:

Abstract

Author Biographies

V. V. Moskalenko, Sumy State University, Sumy

A. S. Moskalenko, Sumy State University, Sumy

Y. V. Moskalenko, Sumy State University, Sumy

A. V. Vatsenko, Sumy State University, Sumy

References

Downloads

Published

How to Cite

Issue

Section

License

Creative Commons Licensing Notifications in the Copyright Notices

Information

Current Issue

Announcements