Optimizing Strategy Portfolios For Combinatorial Optimization With Deep Reinforcement Learning
Abstract
Solving combinatorial optimization problems is challenging due to their computational
complexity and the lack of efficient methods, especially within dynamic environments.
Various methods have been proposed in the literature to address these challenges and
usually required significant efforts to fine-tune and incorporation of extensive prior domain
knowledge. Recently, machine learning has emerged as a promising way to design
customized and efficient solution methods for combinatorial optimization problems. Many
studies have investigated the application of machine learning to optimize strategic
heuristics or metaheuristics, focusing on generating high-quality solutions. However, most
previous studies only focus on static problems and a single aspect of optimization
algorithms such as construction heuristics and improvement heuristics. The aim of this
study is to overcome that limitation by developing a new machine learning method that
adaptively selects the most efficient portfolio of strategies to generate high-quality
solutions in a reasonable runtime. This novel approach resolves complex and high dimensional optimization problems, improving solution quality and enhancing time
efficacy. Our proposed method develops a portfolio selection policy using Proximal Policy
Optimization with CP-SAT sub-solvers for the Dynamic Job Shop Scheduling Problem.
The model is trained on small to medium-sized data instances and surpasses benchmarked
techniques across various data scales up to large-sized instances in minimizing total
tardiness, as demonstrated by experiments with stochastic job arrivals. The sensitivity
analysis further reveals the most sensitive parameters and consults on the most suitable
settings for an effective training process.