RESOURCE-EFFICIENT, ROBUST, AND ADAPTIVE OBJECT DETECTION IN UAV IMAGERY
DOI:
https://doi.org/10.15588/1607-3274-2026-2-8Keywords:
object detection, robustness, adaptability, adversarial procedural noise, dynamic neural networkAbstract
Context. Ensuring robust, adaptable, and compute-efficient object detection in UAV aerial imagery under distribution shifts, structured and unstructured noise, and strict onboard latency/energy budgets is an urgent scientific task. A compute-aware detector and a complementary training/adaptation method that integrate a dynamic transformer backbone with gate units, parameter-efficient adapters, and resource-bounded test-time adaptation to sustain accuracy under realistic perturbations and domain shift.
Objective. Development of a model and method for object detection in aerial imagery that jointly provide robustness and adaptability while meeting embedded compute and real-time constraints typical of onboard UAV systems.
Methods. The approach combines dynamic neural networks with Gumbel-Softmax gate units over a ViT-T/16 backbone, a Simple FPN and a RetinaNet-like one-stage head, budget-aware losses that target a desired dynamic compression rate, structured procedural noise (Perlin, Gabor, Worley) for robustness training, LeakyReLU6 with a straight-through estimator for stable gradients, and test-time adaptation via objectness-weighted marginal-entropy minimization on lightweight adapters.
Results. On VEDAI, a gated ViT-T/16 detector reaches mAP@0.5 of 0.77 at ~5.0 GFLOPs and 17.8 FPS, rising to 0.79 with adapters and TTA, whereas a static counterpart attains 0.74 at 9.6 GFLOPs and 10.6 FPS; pretraining with procedural noise lifts accuracy further to 0.80 (gated) and 0.82 (gated+TTA) with minimal compute overhead. Under domain shift (trained on VisDrone, evaluated on VEDAI), dynamic gating and TTA improve mAP from 0.54 to 0.60 without noise pretraining and up to 0.66 with it, sustaining ~5.4–5.6 GFLOPs and ~16–17 FPS within an 8–10 GFLOPs budget on 4×A76 CPUs.
Conclusions. The proposed object detection model and method – combining dynamic gating, perturbation-aware training, and
budgeted test-time adaptation – reduce average compute while increasing robustness and adaptability, yielding a superior accuracythroughput trade-off for UAV onboard deployment under real-world disturbances and distribution shifts.
References
Tang G., Ni J., Zhao Y., Gu Y., Cao W.A survey of object detection for UAVs based on deep learning. Remote Sensing, 2023, Vol. 16, № 1, P. 149. DOI: 10.3390/rs16010149.
Wei H., Wang Z., Ni Y. Hierarchical mixed-precision posttraining quantization for SAR ship detection networks. Remote Sensing, 2024, Vol. 16, № 21, P. 4042. DOI: 10.3390/rs16214042.
Hendrycks D., Dietterich T. Benchmarking neural network robustness to common corruptions and perturbations [Electronic resource], 2019. Access mode: https://arxiv.org/abs/1903.12261. DOI: 10.48550/arXiv.1903.12261.
He Y., Meng G., Chen K., Hu X., He J. Towards security threats of deep learning systems: a survey. IEEE Transactions on Software Engineering, 2020, Vol. 48, № 5, pp. 1743 – 1770. DOI: 10.1109/TSE.2020.3034721.
Arkin E., Yadikar N., Xu X., Aysa A., Ubul K. A survey: object detection methods from CNN to transformer. Multimedia Tools and Applications, 2023, Vol. 82, № 14, pp. 21353–21383. DOI: 10.1007/s11042-022-13801-3.
Rahmath P. H., Srivastava V., Chaurasia K., Pacheco R. G., Couto R. S. Early-Exit deep neural network – a comprehensive survey. ACM Computing Surveys, 2024, Vol. 57, № 3, pp. 1–37. DOI: 10.1145/3698767.
Sun Y., Sun Z., Chen W. The evolution of object detection methods, Engineering Applications of Artificial Intelligence, 2024, Vol. 133, № 108458. DOI: 10.1016/j.engappai.2024.108458.
Cao J., Peng B., Gao M., Hao H., Li X., Mou H. Object detection based on CNN and vision-transformer: a survey. IET Computer Vision, 2025, Vol. 19, № 1, pp. 1–30. DOI: 10.1049/cvi2.70028.
Papa L., Russo P., Amerini I., Zhou L. A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, Vol. 46, № 12, pp. 7682–7700. DOI: 10.1109/TPAMI.2024.3392941.
Ruan X., Tang W. Fully test time adaptation for object detection, 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, USA, 17–18 June 2024 : proceedings. Piscataway, IEEE, 2024, pp. 1038–1047. DOI: 10.1109/CVPRW63382.2024.00110.
Li Y., Fan Q., Huang H., Han Z., Gu Q. A modified YOLOv8 detection network for UAV aerial image recognition. Drones, 2023, Vol. 7, № 5. DOI: 10.3390/drones7050304.
Wu W., Liu A., Hu J., Mo Y., Xiang S., Duan P., Liang Q. EUAVDet: an efficient and lightweight object detector for UAV aerial images with an edge-based computing platform. Drones, 2024, Vol. 8, № 6, P. 261. DOI: 10.3390/drones8060261.
Lyu Z., Yu T., Pan F., Zhang Y., Luo J., Zhang D., Chen Y., Zhang B., Li G. A survey of model compression strategies for object detection. Multimedia Tools and Applications, 2023, Vol. 83, P. 48165–48236. DOI: 10.1007/s11042-023- 17192-x.
Ju W., Bao W., Ge L., Yuan D. Dynamic early exit scheduling for deep neural network inference through contextual bandits. CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, New York, NY, USA, 1–5 November 2021 : proceedings. New York, ACM, 2021, pp. 823–832. DOI: 10.1145/3459637.3482335.
Yin H., Vahdat A., Alvarez, J. M. Mallya A., Kautz J., Molchanov P. A-ViT: adaptive tokens for efficient vision transformer. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022 : proceedings. Piscataway, IEEE,
DOI: 10.48550/arXiv.2112.07658.
Li Y., Xie B., Guo S., Yang Y., Xiao B. A survey of robustness and safety of 2D and 3D deep learning models against adversarial attacks, ACM Computing Surveys, 2024, Vol. 56, № 6, pp. 1–37. DOI: 10.1145/3636551.
Awad Z., Zakaria M., Hassan R. An enhanced ensemble defense framework for boosting adversarial robustness of intrusion detection systems. Scientific Reports, 2025, Vol. 15, № 1, P. 94023. DOI: 10.1038/s41598-025-94023-z.
Chen Y., Shen Y., Duan C., Wang Z., Mo Z., Liang Y., Zhang Q. Robust and efficient SAR ship detection: an integrated despecking and detection framework. Remote Sensing, 2025, Vol. 17, № 4, P. 580. DOI: 10.3390/rs17040580.
Haque M., Yang W. Dynamic neural network is all you need: understanding the robustness of dynamic mechanisms in neural networks. 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Paris, France, 2–6 October 2023 : proceedings. Piscataway: IEEE, 2023. DOI: 10.1109/ICCVW60793.2023.00163.
Liu J., Jin Y. A comprehensive survey of robust deep learning in computer vision. Journal of Automation and Intelligence, 2023, Vol. 2, № 4, pp. 175–195. DOI: 10.1016/j.jai.2023.10.002.
Wang S., Veldhuis R., Brune C., Strisciuglio N. A survey on the robustness of computer vision models against common corruptions. [Electronic resource], 2023. Access mode: https://arxiv.org/abs/2305.06024.
Gharoun H., Momenifar F., Chen F., Gandomi A. H.Metalearning approaches for few-shot learning: a survey of recent advances. ACM Computing Surveys, 2024, Vol. 57, № 8. DOI: 10.1145/3659943.
Rao Y., Liu Z., Zhao W., Zhou J., Lu J. Dynamic spatial sparsification for efficient vision transformers and convolutional neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, pp. 1–14. DOI: 10.48550/arXiv.2106.02034.
Wang D., Shelhamer E., Liu S., Olshausen B., Darrell T. Tent: fully test-time adaptation by entropy minimization. [Electronic resource], 2021. Access mode: https://arxiv.org/abs/2006.10725. DOI: 10.48550/arXiv.2006.10725.
Maesumi A., Hu D., Saripalli K., Kim V. G., Fisher M., Pirk S., Ritchie D. One noise to rule them all: learning a unified model of spatially-varying noise patterns. ACM Transactions on Graphics, 2024, Vol. 43, № 4, pp. 1–21. DOI: 10.1145/3658195.
Touvron H., Cord M., Douze M., Massa F., Sablayrolles A., Jégou H. Training data-efficient image transformers & distillation through attention. [Electronic resource], 2020. Access mode: https://arxiv.org/abs/2012.12877. DOI: 10.48550/arXiv.2012.12877.
Scardapane S., Baiocchi A., Devoto A., Marsocci V., Minervini P., Pomponi J. Conditional computation in neural networks: principles and research trends. Intelligenza Artificiale, 2024, Vol. 18, № 1. DOI: 10.3233/IA-240035.
Meng L., Li H., Chen B., Lan S., Wu Z., Jiang Y., S. Lim AdaViT: adaptive vision transformers for efficient image recognition. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, LA, USA, 18–24 June 2022 : proceedings. Piscataway, IEEE, 2022. DOI: 10.1109/CVPR52688.2022.01199.
Chen S., Ge C., Tong Z., Wang J., Song Y., Wang J., Luo P. AdaptFormer: adapting vision transformers for scalable visual recognition. NeurIPS: Conference on Neural Information Processing Systems, 2022 : proceedings, 2022. DOI: 10.5555/3600270.3601482.
Li Y., Mao H., Girshick R., He K. Exploring plain vision transformer backbones for object detection. Lecture Notes in Computer Science. Cham, Springer, 2022, pp. 280–296. DOI: 10.1007/978-3-031-20077-9_17.
Sandler M., Howard A., Zhu M., Zhmoginov A., Chen L. MobileNetV2: inverted residuals and linear bottlenecks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018 : proceedings. Piscataway, IEEE, 2018. DOI: 10.1109/CVPR.2018.00474.
Assran M., Caron M., Misra I., Bojanowski P., Bordes F., Vincent P., Joulin A., Rabbat M., Ballas N. Masked siamese networks for label-efficient learning. Lecture Notes in Computer Science. Cham, Springer, 2022, pp. 456–473. DOI: 10.1007/978-3-031-19821-2_26.
Caron M., Touvron H., Misra I., Jégou H., Mairal J., Bojanowski P., Joulin A. Emerging properties in selfsupervised vision transformers. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021 : proceedings.
Piscataway, IEEE, 2021. DOI: 10.1109/ICCV48922.2021.00951.
Oquab M., Darcet T., Moutakanni T., Vo H., Szafraniec M., Khalidov V., Fernandez P., Haziza D., Massa F., El-Nouby A., Assran M., Ballas N., Galuba W., Howes R., Huang P., Li S., Misra I., Rabbat M., Sharma V., Synnaeve G., Xu H., Jegou H., Mairal J., Labatut P., Joulin A., Bojanowski P. DINOv2: learning robust visual features without supervision. [Electronic resource], 2023. Access mode: https://arxiv.org/abs/2304.07193.
Lagae A., Lefebvre S., Cook R., Derose T., Drettakis G., Ebert D. S., Lewis J. P., Perlin K., Zwicker M. A survey of procedural noise functions. Computer Graphics Forum, 2010, Vol. 29, № 8, pp. 2579–2600. DOI: 10.1111/j.1467-8659.2010.01827.x.
Zhang M., Levine S., Finn C. MEMO: test time robustness via adaptation and augmentation [Electronic resource], 2021. Access mode: https://arxiv.org/abs/2106.07596.
He K., Chen X., Xie S., Li Y., Dollar P., Girshick R. Masked autoencoders are scalable vision learners. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, 18–24 June 2022 :proceedings. Piscataway, NJ, IEEE, 2022. DOI: 10.1109/CVPR52688.2022.01553.
Tariq J., Kwong S., Yuan H. HEVC intra mode selection based on rate-distortion (RD) cost and sum of absolute difference (SAD). Journal of Visual Communication and Image Representation, 2016, Vol. 35, pp. 112–119. DOI: 10.1016/j.jvcir.2015.11.013.
Cao Y., He Z., Wang L., Wang W., Yuan Y., Zhang D., Zhang J., Zhu P., Gool L. V., Han J., Hoi S., Hu1 Q., Liu M., Cheng C., Liu F., Cao G., Li G., Wang H., He J., Wan J., Wan Q., Zhao Q., Lyu S., Zhao W., Lu X., Zhu X., Liu Y., Lv Y., Ma Y., Yang Y., Wang Z., Xu Z., Luo Z., Zhang Z., Zhang Z., Li Z., Zhang Z. VisDrone DET2021: the vision meets drone object detection challenge results. 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). Montreal, BC, Canada, 11–17 October 2021 : proceedings. Piscataway, NJ, IEEE, 2021.
DOI: 10.1109/ICCVW54120.2021.00319.
Leng J., Ye Y., Mo M., Gao C., Gan J., Xiao B., Gao X. Recent advances for aerial object detection: a survey. ACM Computing Surveys, 2024, Vol. 56, № 12. DOI: 10.1145/3664598.
Qiu Y., Zheng X., Hao X., Zhang G., Lei T., Jiang P. ARSOD-YOLO: enhancing small target detection for remote sensing images. Sensors, 2024, Vol. 24, № 23, P. 7472. DOI: 10.3390/s24237472.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 V. V. Moskalenko, A. S. Moskalenko, Y. V. Moskalenko, A. V. Vatsenko

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Creative Commons Licensing Notifications in the Copyright Notices
The journal allows the authors to hold the copyright without restrictions and to retain publishing rights without restrictions.
The journal allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles.
The journal allows to reuse and remixing of its content, in accordance with a Creative Commons license СС BY -SA.
Authors who publish with this journal agree to the following terms:
-
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License CC BY-SA that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
-
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
-
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) as it can lead to productive exchanges, as well as earlier and greater citation of published work.