Microsoft Research338 тыс
Популярные
Опубликовано 12 декабря 2022, 20:34
2022 Data-driven Optimization Workshop: End-to-end Reinforcement Learning for the Large-scale Traveling Salesman Problem
Speaker: Yan Jin, Huazhong University of Science and Technology
Traveling Salesman Problem (TSP) is one of the most studied routing problems that arise in the practical applications of logistics. Traditional approaches not only rely on hand-crafted rules of experts, but also are time-consuming on iterative search. This limits their applications in time sensitive scenarios, e.g., on-call routing and ride hailing service. We propose an end-to-end approach based on hierarchical reinforcement learning for addressing the large-scale TSP. Using a divide-and-conquer strategy, the upper-level policy chooses a small subset of cities from all remaining cities that are to be traversed, while the lower-level policy takes a Transformer model on the chosen cities to solve a shortest path with prescribed starting and ending cities. These two policies are jointly trained by reinforcement learning algorithms, and the TSP solutions can be directly generated without any search procedure. The proposed approach takes advantage of inference efficiency of Transformer model and provides highly competitive results.
Speaker: Yan Jin, Huazhong University of Science and Technology
Traveling Salesman Problem (TSP) is one of the most studied routing problems that arise in the practical applications of logistics. Traditional approaches not only rely on hand-crafted rules of experts, but also are time-consuming on iterative search. This limits their applications in time sensitive scenarios, e.g., on-call routing and ride hailing service. We propose an end-to-end approach based on hierarchical reinforcement learning for addressing the large-scale TSP. Using a divide-and-conquer strategy, the upper-level policy chooses a small subset of cities from all remaining cities that are to be traversed, while the lower-level policy takes a Transformer model on the chosen cities to solve a shortest path with prescribed starting and ending cities. These two policies are jointly trained by reinforcement learning algorithms, and the TSP solutions can be directly generated without any search procedure. The proposed approach takes advantage of inference efficiency of Transformer model and provides highly competitive results.
Свежие видео