Anytime multi-agent path finding (MAPF) is a promising approach to scalable and collision-free path optimization in multi-agent systems. MAPF-LNS, based on Large Neighborhood Search (LNS), is the current state-of-the-art approach where a fast initial solution is iteratively optimized by destroying and repairing selected paths, i.e., a neighborhood, of the solution. Delay-based MAPF-LNS has demonstrated particular effectiveness in generating promising neighborhoods via seed agents, according to their delays. Seed agents are selected using handcrafted strategies or online learning, where the former relies on human intuition about underlying structures, while the latter conducts black-box optimization, ignoring any structure. In this paper, we propose Truncated Adaptive Counterfactual K-ranked LEarning (TACKLE) to select seed agents via informed online learning by leveraging handcrafted strategies as human intuition. We show theoretically that TACKLE dominates its handcrafted and black-box learning counterparts in the limit. Our experiments demonstrate cost improvements of at least 60% in instances with one thousand agents, compared with state-of-the-art anytime solvers.
@article{ phan1AAAI26,
author = "Thomy Phan and Shao-Hung Chan and Sven Koenig",
title = "Truncated Counterfactual Learning for Anytime Multi-Agent Path Finding",
year = "2026",
abstract = "Anytime multi-agent path finding (MAPF) is a promising approach to scalable and collision-free path optimization in multi-agent systems. MAPF-LNS, based on Large Neighborhood Search (LNS), is the current state-of-the-art approach where a fast initial solution is iteratively optimized by destroying and repairing selected paths, i.e., a neighborhood, of the solution. Delay-based MAPF-LNS has demonstrated particular effectiveness in generating promising neighborhoods via seed agents, according to their delays. Seed agents are selected using handcrafted strategies or online learning, where the former relies on human intuition about underlying structures, while the latter conducts black-box optimization, ignoring any structure. In this paper, we propose Truncated Adaptive Counterfactual K-ranked LEarning (TACKLE) to select seed agents via informed online learning by leveraging handcrafted strategies as human intuition. We show theoretically that TACKLE dominates its handcrafted and black-box learning counterparts in the limit. Our experiments demonstrate cost improvements of at least 60% in instances with one thousand agents, compared with state-of-the-art anytime solvers.",
url = "https://thomyphan.github.io/files/2026-aaai-preprint1.pdf",
eprint = "https://thomyphan.github.io/files/2026-aaai-preprint1.pdf",
location = "Singapore",
journal = "Proceedings of the AAAI Conference on Artificial Intelligence",
}
Related Articles
- T. Phan et al., “Spatially Grouped Curriculum Learning for Multi-Agent Path Finding”, AAAI 2026
- T. Phan et al., “Generative Curricula for Multi-Agent Path Finding via Unsupervised and Reinforcement Learning”, JAIR 2025
- T. Phan et al., “Anytime Multi-Agent Path Finding with an Adaptive Delay-Based Heuristic”, AAAI 2025
- T. Phan et al., “Counterfactual Online Learning for Open-Loop Monte-Carlo Planning”, AAAI 2025
- T. Phan et al., “Adaptive Anytime Multi-Agent Path Finding using Bandit-Based Large Neighborhood Search”, AAAI 2024
- T. Phan et al., “Confidence-Based Curriculum Learning for Multi-Agent Path Finding”, AAMAS 2024
- S. Chan et al., “Anytime Multi-Agent Path Finding using Operator Parallelism in Large Neighborhood Search”, AAMAS 2024
Relevant Research Areas