Learning Collaborative Policies to Solve NP-hard Routing Problems

10/26/2021
by   Minsu Kim, et al.
0

Recently, deep reinforcement learning (DRL) frameworks have shown potential for solving NP-hard routing problems such as the traveling salesman problem (TSP) without problem-specific expert knowledge. Although DRL can be used to solve complex problems, DRL frameworks still struggle to compete with state-of-the-art heuristics showing a substantial performance gap. This paper proposes a novel hierarchical problem-solving strategy, termed learning collaborative policies (LCP), which can effectively find the near-optimum solution using two iterative DRL policies: the seeder and reviser. The seeder generates as diversified candidate solutions as possible (seeds) while being dedicated to exploring over the full combinatorial action space (i.e., sequence of assignment action). To this end, we train the seeder's policy using a simple yet effective entropy regularization reward to encourage the seeder to find diverse solutions. On the other hand, the reviser modifies each candidate solution generated by the seeder; it partitions the full trajectory into sub-tours and simultaneously revises each sub-tour to minimize its traveling distance. Thus, the reviser is trained to improve the candidate solution's quality, focusing on the reduced solution space (which is beneficial for exploitation). Extensive experiments demonstrate that the proposed two-policies collaboration scheme improves over single-policy DRL framework on various NP-hard routing problems, including TSP, prize collecting TSP (PCTSP), and capacitated vehicle routing problem (CVRP).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/03/2019

Sequential Triggers for Watermarking of Deep Reinforcement Learning Policies

This paper proposes a novel scheme for the watermarking of Deep Reinforc...
research
07/08/2022

Product Segmentation Newsvendor Problems: A Robust Learning Approach

We propose and analyze a product segmentation newsvendor problem, which ...
research
12/24/2020

Learning Vehicle Routing Problems using Policy Optimisation

Deep reinforcement learning (DRL) has been used to learn effective heuri...
research
03/04/2023

Neural Airport Ground Handling

Airport ground handling (AGH) offers necessary operations to flights dur...
research
11/20/2019

Genetic Programming Hyper-Heuristics with Vehicle Collaboration for Uncertain Capacitated Arc Routing Problems

Due to its direct relevance to post-disaster operations, meter reading a...
research
10/06/2021

Learning to Iteratively Solve Routing Problems with Dual-Aspect Collaborative Transformer

Recently, Transformer has become a prevailing deep architecture for solv...
research
10/30/2020

POMO: Policy Optimization with Multiple Optima for Reinforcement Learning

In neural combinatorial optimization (CO), reinforcement learning (RL) c...

Please sign up or login with your details

Forgot password? Click here to reset