FASTER OPTIMIZATION-BASED META-LEARNING ADAPTATION PHASE
DOI:
https://doi.org/10.15588/1607-3274-2022-1-10Keywords:
few-shot learning, meta-learning, Model-Agnostic Meta-Learning, MAML, adaptation time, adaptation speed, optimization-based meta-learningAbstract
Context. Neural networks require a large amount of annotated data to learn. Meta-learning algorithms propose a way to decrease number of training samples to only a few. One of the most prominent optimization-based meta-learning algorithms is MAML. However, its adaptation to new tasks is quite slow. The object of study is the process of meta-learning and adaptation phase as defined by the MAML algorithm.
Objective. The goal of this work is creation of an approach, which should make it possible to: 1) increase the execution speed of MAML adaptation phase; 2) improve MAML accuracy in certain cases. The testing results will be shown on a publicly available few-shot learning dataset CIFAR-FS.
Method. In this work an improvement to MAML meta-learning algorithm is proposed. Meta-learning procedure is defined in terms of tasks. In case of image classification problem, each task is to try to learn to classify images of new classes given only a few training examples. MAML defines 2 stages for the learning procedure: 1) adaptation to the new task; 2) meta-weights update. The whole training procedure requires Hessian computation, which makes the method computationally expensive. After being trained, the network will typically be used for adaptation to new tasks and the subsequent prediction on them. Thus, improving adaptation time is an important problem, which we focus on in this work. We introduce lambda pattern by which we restrict which weight we update in the network during the adaptation phase. This approach allows us to skip certain gradient computations. The pattern is selected given an allowed quality degradation threshold parameter. Among the pattern that fit the criteria, the fastest pattern is then selected. However, as it is discussed later, quality improvement is also possible is certain cases by a careful pattern selection.
Results. The MAML algorithm with lambda pattern adaptation has been implemented, trained and tested on the open CIFAR-FS dataset. This makes our results easily reproducible.
Conclusions. The experiments conducted have shown that via lambda adaptation pattern selection, it is possible to significantly improve the MAML method in the following areas: adaptation time has been decreased by a factor of 3 with minimal accuracy loss. Interestingly, accuracy for one-step adaptation has been substantially improved by using lambda patterns as well. Prospects for further research are to investigate a way of a more robust automatic pattern selection scheme.
References
He K., Zhang X. , Ren S. et al. Deep Residual Learning for Image Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016. Las Vegas, NV, USA, June 27–30, 2016. IEEE Computer Society, 2016. pp. 770–778. DOI: 10.1109/CVPR.2016.90.
Deng J., Dong W., Socher R. et al. ImageNet: A large-scale hierarchical image database, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20–25 June 2009. Miami, Florida, USA, IEEE Computer Society, 2009, pp. 248–255. DOI: 10.1109/CVPR.2009.5206848.
Huang G., Liu Z., Maaten L. et al. Densely Connected Convolutional Networks, 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017. Honolulu, HI, USA, July 21–26, 2017, IEEE Computer Society, 2017, pp. 2261–2269. DOI: 10.1109/CVPR.2017.243.
Zagoruyko S., Komodakis N. Wide Residual Networks, Proceedings of the British Machine Vision Conference (BMVC). BMVA Press, 2016, pp. 87.1–87.12. DOI: 10.5244/C.30.87.
Finn C., Abbeel P., Levine S. Model-Agnostic MetaLearning for Fast Adaptation of Deep Networks, Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017, Proceedings of Machine Learning Research. PMLR, 2017, Vol. 70, pp. 1126–1135.
Rajeswaran A., Finn C., Kakade S. et al. Meta-Learning with Implicit Gradients, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8–14, 2019. Vancouver, BC, Canada, 2019, pp. 113–124.
Khabarlak K., Koriashkina L. Fast Facial Landmark Detection and Applications: A Survey [Electronic resource], arXiv:2101.10808 [cs], 2021. Access mode: https://arxiv.org/abs/2101.10808
[Bertinetto L., Henriques J., Torr P. et al. Meta-learning with differentiable closed-form solvers [Electronic resource], 7th International Conference on Learning Representations, ICLR 2019. New Orleans, LA, USA, May 6–9, 2019. Access mode: https://openreview.net/forum?id=HyxnZh0ct7.
Snell J., Swersky K., Zemel R.S. // Prototypical Networks for Few-shot Learning, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4–9, 2017. Long Beach, CA, USA, 2017, pp. 4077–4087.
Ravi S., Larochelle H. Optimization as a Model for FewShot Learning [Electronic resource], 5th International Conference on Learning Representations, ICLR 2017. Toulon, France, April 24–26, 2017, Conference Track Proceedings. Access mode: https://openreview.net/forum?id=rJY0-Kcll.
Antoniou A., Edwards H., Storkey A. J. How to train your MAML [Electronic resource], 7th International Conference on Learning Representations, ICLR 2019. New Orleans, LA, USA, May 6–9, 2019. Access mode: https://openreview.net/forum?id=HJGven05Y7.
Weng L. Meta-Learning: Learning to Learn Fast [Electronic resource]. Access mode: https://lilianweng.github.io/lil-log/2018/11/30/meta-learning.html.
Yin W. Meta-learning for Few-shot Natural Language Processing: A Survey [Electronic resource], CoRR, 2020, Vol. abs/2007.09604. Access mode: https://arxiv.org/abs/2007.09604.
Wang Y., Yao Q. , Kwok J. et al. Generalizing from a Few Examples: A Survey on Few-shot Learning, ACM Comput. Surv, 2020, Vol. 53, No. 3, pp. 63:1–63:34. DOI: 10.1145/3386252.
Guo Y., Zhang L. One-shot Face Recognition by Promoting Underrepresented Classes [Electronic resource], CoRR, 2017, Vol. abs/1707.05574. Access mode: http://arxiv.org/abs/1707.05574.
Koch G., Zemel R., Salakhutdinov R. Siamese neural networks for one-shot image recognition / G. Koch, // ICML deep learning workshop. Lille, 2015, Vol. 2.
Vinyals O., Blundell C., Lillicrap T. et al. Matching Networks for One Shot Learning, Advances in Neural Information Processing Systems 29, Annual Conference on Neural Information Processing Systems 2016, December 5–10, 2016. Barcelona, Spain, 2016, pp. 3630–3638.
Santoro A., Bartunov S., Botvinick M. et al. Meta-Learning with Memory-Augmented Neural Networks, Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19–24, 2016, JMLR Workshop and Conference Proceedings. JMLR.org, 2016, Vol. 48, pp. 1842–1850.
Lake B. M., Salakhutdinov R., Tenenbaum J. B. Humanlevel concept learning through probabilistic program induction, Science, 2015, Vol. 350, No. 6266, pp. 1332–1338. DOI: 10.1126/science.aab3050.
Nichol A., Achiam J., Schulman J. On First-Order MetaLearning Algorithms [Electronic resource], CoRR, 2018, Vol. abs/1803.02999. Access mode: http://arxiv.org/abs/1803.02999.
Li Z., Zhou F., Chen F. et al. Meta-SGD: Learning to Learn Quickly for Few Shot Learning [Electronic resource], CoRR. 2017, Vol. abs/1707.09835. Access mode: http://arxiv.org/abs/1707.09835.
Zeiler M.D., Fergus R. Visualizing and Understanding Convolutional Networks, Computer Vision, ECCV 2014, 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part I, Lecture Notes in Computer Science. Springer, 2014, Vol. 8689, pp. 818–833. DOI: 10.1007/978-3-319-10590-1_53.
Ioffe S., Szegedy C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6–11 July 2015: JMLR Workshop and Conference Proceedings. – JMLR.org, 2015, Vol. 37, pp. 448–456.
Kingma D. P., Ba J. Adam: A Method for Stochastic Optimization, 3rd International Conference on Learning Representations, ICLR 2015. San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings, 2015.
Krizhevsky A. Learning multiple layers of features from tiny images [Electronic resource], University of Toronto, 2009, Access mode: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 К. С. Хабарлак
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Creative Commons Licensing Notifications in the Copyright Notices
The journal allows the authors to hold the copyright without restrictions and to retain publishing rights without restrictions.
The journal allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles.
The journal allows to reuse and remixing of its content, in accordance with a Creative Commons license СС BY -SA.
Authors who publish with this journal agree to the following terms:
-
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License CC BY-SA that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
-
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
-
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.