Q-Learning-Assisted Simulated Annealing for Traveling Salesman Problem Optimization
Keywords:
Combinatorial Optimization, Simulated annealing, Traveling Salesman Problem, Q-learning
Abstract
Simulated Annealing (SA) is a well-established metaheuristic for tackling combinatorial optimization problems. It draws inspiration from the physical process of annealing in metallurgy. In the optimization context, SA iteratively explores the solution space by accepting not only improving solutions but also, with a temperature-dependent probability, non-improving ones. This mechanism enables the algorithm to escape local optima, thereby enhancing its ability to approach the global minimum of an objective function. Nevertheless, its overall performance is susceptible to the choice of the cooling schedule and the use of fixed neighborhood structures. In this work, we include Q-learning into the SA framework to improve its flexibility. Q-learning is a model-free, value-based method that enables an agent to learn optimal action-selection policies by iteratively updating Q-values using rewards obtained through exploration of the environment. The suggested approach directs the search toward more promising areas by dynamically choosing a leader solution from a predefined set of potential solutions that are updated during iterations, using a learned Q-policy. The Q-values are updated according to the relative improvement each leader provides over time, allowing adaptive exploitation of successful guides. Experimental results on popular benchmark instances of the Travelling Salesman Problem (TSP) from TSPLIB95 demonstrate that the Q-learning-guided SA achieves better solution quality compared to classical SA in most of the tested instances. These results demonstrate how experience-driven decision-making in reinforcement learning can enhance metaheuristic performance.
Published
2026-02-18
How to Cite
ADIL, N., EDDAOUDI, F., LAKHBAB, H., & NAIMI, M. (2026). Q-Learning-Assisted Simulated Annealing for Traveling Salesman Problem Optimization. Statistics, Optimization & Information Computing. https://doi.org/10.19139/soic-2310-5070-3028
Issue
Section
Research Articles
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).