A New Hybrid Optimizer for Global Optimization Based on a Comparative Study Remarks of Classical Gradient Descent Variants

Mouad Touarsi; Driss Gretete; Abdelmajid Elouadi

doi:10.19139/soic-2310-5070-1005

Mouad Touarsi Ibn Tofail University
Driss Gretete Ibn Tofail University
Abdelmajid Elouadi Ibn Tofail University

DOI: https://doi.org/10.19139/soic-2310-5070-1005

Keywords: global numerical optimization, mono-objective, descent gradient variants, analytic hierarchy process, hybrid optimization, random search.

Abstract

In this paper, we present an empirical comparison of some Gradient Descent variants used to solve globaloptimization problems for large search domains. The aim is to identify which one of them is more suitable for solving an optimization problem regardless of the features of the used test function. Five variants of Gradient Descent were implemented in the R language and tested on a benchmark of five test functions. We proved the dependence between the choice of the variant and the obtained performances using the khi-2 test in a sample of 120 experiments. Those test functions vary on convexity, the number of local minima, and are classified according to some criteria. We had chosen a range of values for each algorithm parameter. Results are compared in terms of accuracy and convergence speed. Based on the obtained results,we defined the priority of usage for those variants and we contributed by a new hybrid optimizer. The new optimizer is testedin a benchmark of well-known test functions and two real applications are proposed. Except for the classical gradient descent algorithm, only stochastic versions of those variants are considered in this paper.

References

X.-S. Yang, Optimization Techniques and Applications with Examples. John Wiley & Sons, 2018.

C. T. Kelley, Iterative methods for optimization. SIAM, 1999.

M. Cavazzuti, Optimization methods: from theory to design scientific and technological aspects in mechanics. Springer Science & Business Media, 2012.

J.-P. Grivet, M´ethodes num´eriques appliqu´ees pour le scientifique et ling´enieur (edition 2009): Edition 2013. EDP sciences, 2012.

C. Lemar´echal, “Cauchy and the gradient method,” Doc Math Extra, vol. 251, p. 254, 2012.

A. Cauchy, “M´ethode g´en´erale pour la r´esolution des systemes d´equations simultan´ees,” Comp. Rend. Sci. Paris, vol. 25, no. 1847, pp. 536–538, 1847.

J. C. Meza, “Steepest descent,” Wiley Interdisciplinary Reviews: Computational Statistics, vol. 2, no. 6, pp. 719–722, 2010.

H. Robbins and S. Monro, “A stochastic approximation method,” The annals of mathematical statistics, pp. 400–407, 1951.

J. Kiefer, J.Wolfowitz, et al., “Stochastic estimation of the maximum of a regression function,” The Annals of Mathematical Statistics, vol. 23, no. 3, pp. 462–466, 1952.

D. E. Rumelhart, G. E. Hinton, and R. J.Williams, “Learning representations by back-propagating errors,” nature, vol. 323, no. 6088, pp. 533–536, 1986.

N. Qian, “On the momentum term in gradient descent learning algorithms,” Neural networks, vol. 12, no. 1, pp. 145–151, 1999.

J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization.,” Journal of machine learning research, vol. 12, no. 7, 2011.

T. Tieleman and G. Hinton, “Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude,” COURSERA: Neural networks for machine learning, vol. 4, no. 2, pp. 26–31, 2012.

R. Thangaraj, M. Pant, A. Abraham, and P. Bouvry, “Particle swarm optimization: hybridization perspectives and experimental illustrations,” Applied Mathematics and Computation, vol. 217, no. 12, pp. 5208–5226, 2011.

H.-q. Li and L. Li, “A novel hybrid particle swarm optimization algorithm combined with harmony search for high dimensional optimization problems,” in The 2007 International Conference on Intelligent Pervasive Computing (IPC 2007), pp. 94–97, IEEE, 2007.

A. Antoniou and W.-S. Lu, Practical optimization: algorithms and engineering applications. Springer Science & Business Media, 2007.

S. Ruder, “An overview of gradient descent optimization algorithms,” arXiv preprint arXiv:1609.04747, 2016.

M. Hardt, B. Recht, and Y. Singer, “Train faster, generalize better: Stability of stochastic gradient descent,” in International Conference on Machine Learning, pp. 1225–1234, PMLR, 2016.

R. Sweke, F. Wilde, J. J. Meyer, M. Schuld, P. K. F¨ahrmann, B. Meynard-Piganeau, and J. Eisert, “Stochastic gradient descent for hybrid quantum-classical optimization,” Quantum, vol. 4, p. 314, 2020.

W. E. L. Ilboudo, T. Kobayashi, and K. Sugimoto, “Tadam: A robust stochastic gradient optimizer,” arXiv preprint arXiv:2003.00179, 2020.

S. Khirirat, H. R. Feyzmahdavian, and M. Johansson, “Mini-batch gradient descent: Faster convergence under data sparsity,” in 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp. 2880–2887, IEEE, 2017.

J. Ding, L. Han, and D. Li, “An adaptive control momentum method as an optimizer in the cloud,” Future Generation Computer Systems, vol. 89, pp. 192–200, 2018.

J. Duda, “Sgd momentum optimizer with step estimation by online parabola model,” arXiv preprint arXiv:1907.07063, 2019.

G. Goh, “Why momentum really works,” Distill, vol. 2, no. 4, p. e6, 2017.

M. Yaqub, F. Jinchao, M. S. Zia, K. Arshid, K. Jia, Z. U. Rehman, and A. Mehmood, “State-of-the-art cnn optimizer for brain tumor segmentation in magnetic resonance images,” Brain Sciences, vol. 10, no. 7, p. 427, 2020.

“Cs231n convolutional neural networks for visual recognition.”

H. Yu, R. Jin, and S. Yang, “On the linear speedup analysis of communication efficient momentum sgd for distributed non-convex optimization,” in International Conference on Machine Learning, pp. 7184–7193, PMLR, 2019.

M. D. Zeiler, “Adadelta: an adaptive learning rate method,” arXiv preprint arXiv:1212.5701, 2012.

A. Cutkosky and H. Mehta, “Momentum improves normalized sgd,” in International Conference on Machine Learning, pp. 2260– 2268, PMLR, 2020.

J. Chen and A. Kyrillidis, “Decaying momentum helps neural network training,” arXiv preprint arXiv:1910.04952, 2019.

S. Reddi, M. Zaheer, D. Sachan, S. Kale, and S. Kumar, “Adaptive methods for nonconvex optimization,” in Proceeding of 32nd Conference on Neural Information Processing Systems (NIPS 2018), 2018.

N. Zhang, D. Lei, and J. Zhao, “An improved adagrad gradient descent optimization algorithm,” in 2018 Chinese Automation Congress (CAC), pp. 2359–2362, IEEE, 2018.

M. C. Mukkamala and M. Hein, “Variants of rmsprop and adagrad with logarithmic regret bounds,” in International Conference on Machine Learning, pp. 2545–2553, PMLR, 2017.

A. Lydia and S. Francis, “Adagradan optimizer for stochastic gradient descent,” Int. J. Inf. Comput. Sci., vol. 6, no. 5, 2019.

A. D´efossez, L. Bottou, F. Bach, and N. Usunier, “On the convergence of adam and adagrad,” arXiv preprint arXiv:2003.02395, 2020.

R. V. K. Reddy, B. S. Rao, and K. P. Raju, “Handwritten hindi digits recognition using convolutional neural network with rmsprop optimization,” in 2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 45–51, IEEE, 2018.

M. E. Khan, Z. Liu, V. Tangkaratt, and Y. Gal, “Vprop: Variational inference using rmsprop,” arXiv preprint arXiv:1712.01038, 2017.

D. V. Babu, C. Karthikeyan, A. Kumar, et al., “Performance analysis of cost and accuracy for whale swarm and rmsprop optimizer,” in IOP Conference Series: Materials Science and Engineering, vol. 993, p. 012080, IOP Publishing, 2020.

M. Molga and C. Smutnicki, “Test functions for optimization needs,” Test functions for optimization needs, vol. 101, p. 48, 2005.

N. Andrei, “An unconstrained optimization test functions collection,” Adv. Model. Optim, vol. 10, no. 1, pp. 147–161, 2008.

J. Barzilai and M. A. Dempster, “Measuring rates of convergence of numerical algorithms,” Journal of optimization theory and applications, vol. 78, no. 1, pp. 109–125, 1993.

D. J. Sheskin, Handbook of Parametric and Nonparametric Statistical Procedures. CRC Press, jun 2004.

J.-B. Rakotoarivelo, Aide `a la d´ecision multi-crit`ere pour la gestion des risques dans le domaine financier. PhD thesis, 2018.

E. Mu and M. Pereyra-Rojas, Practical decision making: an introduction to the Analytic Hierarchy Process (AHP) using super decisions V2. Springer, 2016.

J. Bergstra and Y. Bengio, “Random search for hyper-parameter optimization.,” Journal of machine learning research, vol. 13, no. 2, 2012.

S. De, A. Mukherjee, and E. Ullah, “Convergence guarantees for rmsprop and adam in non-convex optimization and an empirical comparison to nesterov acceleration,” arXiv preprint arXiv:1807.06766, 2018.

C. Vastrad et al., “Performance analysis of neural network models for oxazolines and oxazoles derivatives descriptor dataset,” arXiv preprint arXiv:1312.2853, 2013.

A. De Myttenaere, B. Golden, B. Le Grand, and F. Rossi, “Mean absolute percentage error for regression models,” Neurocomputing, vol. 192, pp. 38–48, 2016.

R. T. Haftka and Z. G¨urdal, Elements of structural optimization, vol. 11. Springer Science & Business Media, 2012.

J.-C. Craveur, M. Bruyneel, and P. Gourmelen, Optimisation des structures m´ecaniques: M´ethodes num´eriques et ´el´ements finis. Dunod, 2014.

M. Rouaud, Calcul dincertitudes. Paris, France: Creative Commons, 2014.

M. Jamil and X.-S. Yang, “A literature survey of benchmark functions for global optimisation problems,” International Journal of Mathematical Modelling and Numerical Optimisation, vol. 4, no. 2, pp. 150–194, 2013.

A. A. Abusnaina, A. I. Alsalibi, et al., “Modified global flower pollination algorithm and its application for optimization problems,” Interdisciplinary Sciences: Computational Life Sciences, vol. 11, no. 3, pp. 496–507, 2019.