A reinforcement learning—Variable neighborhood search method for the capacitated Vehicle Routing Problem

Kalatzantonakis, Panagiotis; Sifaleras, Angelo; Samaras, Nikolaos

Please use this identifier to cite or link to this item: https://ruomo.lib.uom.gr/handle/7000/1279

Title:	A reinforcement learning—Variable neighborhood search method for the capacitated Vehicle Routing Problem
Authors:	Kalatzantonakis, Panagiotis Sifaleras, Angelo Samaras, Nikolaos
Type:	Article
Subjects:	FRASCATI::Natural sciences::Mathematics::Applied Mathematics FRASCATI::Natural sciences::Computer and information sciences
Keywords:	Reinforcement Learning Multi-Armed Bandits Intelligent Optimization Bandit Learning Metaheuristics Variable Neighborhood Search Vehicle Routing Problem
Issue Date:	2022
Publisher:	Elsevier
Source:	Expert Systems with Applications
First Page:	118812
Abstract:	Finding the best sequence of local search operators that yield the optimal performance of Variable Neighborhood Search is an important open research question in the field of metaheuristics. This paper proposes a Reinforcement Learning method to address this question. We introduce a new hyperheuristic scheme, termed Bandit VNS, inspired by the Multi-armed Bandit, a particular type of a single state reinforcement learning problem. In Bandit VNS, we utilize the General Variable Neighborhood Search metaheuristic and enhance it by a hyperheuristic strategy. We examine several variations of the Upper Confidence Bound algorithm to create a reliable strategy for adaptive neighborhood selection. Furthermore, we utilize Adaptive Windowing, a state of the art algorithm to estimate and detect changes in the data stream. Bandit VNS is designed for effective parallelization and encourages cooperation between agents to produce the best solution quality. We demonstrate this concept's advantages in accuracy and speed by extensive experimentation using the Capacitated Vehicle Routing Problem. We compare the novel scheme's performance against the conventional General Variable Neighborhood Search metaheuristic in terms of the CPU time and solution quality. The Bandit VNS method shows excellent results and reaches significantly higher performance metrics when applied to well-known benchmark instances. Our experiments show that, our approach achieves an improvement of more than 25% in solution quality when compared to the General Variable Neighborhood Search method using standard library instances of medium and large size.
URI:	https://doi.org/10.1016/j.eswa.2022.118812 https://ruomo.lib.uom.gr/handle/7000/1279
ISSN:	0957-4174
Other Identifiers:	10.1016/j.eswa.2022.118812
Appears in Collections:	Department of Applied Informatics

Files in This Item:

File	Description	Size	Format
A_Reinforcement_Learning_-_VNS_Method_for_the_CVRP.pdf Until 2024-09-16		1,36 MB	Adobe PDF	View/Open Request a copy

Show full item record

Institutional Repository of Academic ResearchUniversity of Macedonia

Institutional Repository of Academic Research
University of Macedonia