We propose Stable Emergent Policy (STEP) approximation, a distributed policy iteration scheme to stably approximate decentralized policies for partially observable and cooperative multi-agent systems. STEP offers a novel training architecture, where function approximation is used to learn from action recommendations of a decentralized planning algorithm. Planning is enabled by exploiting a training simulator, which is assumed to be available during centralized learning, and further enhanced by reintegrating the learned policies. We experimentally evaluate STEP in two challenging and stochastic domains, and compare its performance with state-of-the-art multi-agent reinforcement learning algorithms.
@inproceedings{ phanALA20,
author = "Thomy Phan and Lenz Belzner and Kyrill Schmid and Thomas Gabor and Fabian Ritz and Sebastian Feld and Claudia Linnhoff-Popien",
title = "A Distributed Policy Iteration Scheme for Cooperative Multi-Agent Policy Approximation",
year = "2020",
abstract = "We propose Stable Emergent Policy (STEP) approximation, a distributed policy iteration scheme to stably approximate decentralized policies for partially observable and cooperative multi-agent systems. STEP offers a novel training architecture, where function approximation is used to learn from action recommendations of a decentralized planning algorithm. Planning is enabled by exploiting a training simulator, which is assumed to be available during centralized learning, and further enhanced by reintegrating the learned policies. We experimentally evaluate STEP in two challenging and stochastic domains, and compare its performance with state-of-the-art multi-agent reinforcement learning algorithms.",
url = "https://ala2020.vub.ac.be/papers/ALA2020_paper_36.pdf",
eprint = "https://thomyphan.github.io/files/2020-ala.pdf",
booktitle = "12th Adaptive and Learning Agents Workshop",
}
Related Articles
- T. Phan et al., “Confidence-Based Curriculum Learning for Multi-Agent Path Finding”, AAMAS 2024
- T. Phan et al., “Distributed Policy Iteration for Scalable Approximation of Cooperative Multi-Agent Policies”, AAMAS 2019
- T. Phan et al., “Adaptive Thompson Sampling Stacks for Memory Bounded Open-Loop Planning”, IJCAI 2019
- T. Phan et al., “Memory Bounded Open-Loop Planning in Large POMDPs Using Thompson Sampling”, AAAI 2019
- T. Phan et al., “Leveraging Statistical Multi-Agent Online Planning with Emergent Value Function Approximation”, AAMAS 2018
- T. Phan, “Emergence and Resilience in Multi-Agent Reinforcement Learning”, PhD Thesis
Relevant Research Areas