Q-Learning-Assisted Simulated Annealing for Traveling Salesman Problem Optimization

  • NOUHAILA ADIL Hassan II University
  • FAKHITA EDDAOUDI
  • HALIMA LAKHBAB
  • MOHAMED NAIMI
Keywords: Combinatorial Optimization, Simulated annealing, Traveling Salesman Problem, Q-learning

Abstract

Simulated Annealing (SA) is a well-established metaheuristic for tackling combinatorial optimization problems. It draws inspiration from the physical process of annealing in metallurgy. In the optimization context, SA iteratively explores the solution space by accepting not only improving solutions but also, with a temperature-dependent probability, non-improving ones. This mechanism enables the algorithm to escape local optima, thereby enhancing its ability to approach the global minimum of an objective function. Nevertheless, its overall performance is susceptible to the choice of the cooling schedule and the use of fixed neighborhood structures. In this work, we include Q-learning into the SA framework to improve its flexibility. Q-learning is a model-free, value-based method that enables an agent to learn optimal action-selection policies by iteratively updating Q-values using rewards obtained through exploration of the environment.  The suggested approach directs the search toward more promising areas by dynamically choosing a leader solution from a predefined set of potential solutions that are updated during iterations, using a learned Q-policy. The Q-values are updated according to the relative improvement each leader provides over time, allowing adaptive exploitation of successful guides. Experimental results on popular benchmark instances of the Travelling Salesman Problem (TSP) from TSPLIB95 demonstrate that the Q-learning-guided SA achieves better solution quality compared to classical SA in most of the tested instances. These results demonstrate how experience-driven decision-making in reinforcement learning can enhance metaheuristic performance.
Published
2026-02-18
How to Cite
ADIL, N., EDDAOUDI, F., LAKHBAB, H., & NAIMI, M. (2026). Q-Learning-Assisted Simulated Annealing for Traveling Salesman Problem Optimization. Statistics, Optimization & Information Computing. https://doi.org/10.19139/soic-2310-5070-3028
Section
Research Articles