State-of-the-art approaches to partially observable planning like POMCP are based on stochastic tree search. While these approaches are computationally efficient, they may still construct search trees of considerable size, which could limit the performance due to restricted memory resources. In this paper, we propose Partially Observable Stacked Thompson Sampling (POSTS), a memory bounded approach to openloop planning in large POMDPs, which optimizes a fixed size stack of Thompson Sampling bandits. We empirically evaluate POSTS in four large benchmark problems and compare its performance with different tree-based approaches. We show that POSTS achieves competitive performance compared to tree-based open-loop planning and offers a performance memory tradeoff, making it suitable for partially observable planning with highly restricted computational and memory resources.
@article{ phanAAAI2019,
author = "Thomy Phan and Lenz Belzner and Marie Kiermeier and Markus Friedrich and Kyrill Schmid and Claudia Linnhoff-Popien",
title = "Memory Bounded Open-Loop Planning in Large POMDPs Using Thompson Sampling",
year = "2019",
abstract = "State-of-the-art approaches to partially observable planning like POMCP are based on stochastic tree search. While these approaches are computationally efficient, they may still construct search trees of considerable size, which could limit the performance due to restricted memory resources. In this paper, we propose Partially Observable Stacked Thompson Sampling (POSTS), a memory bounded approach to openloop planning in large POMDPs, which optimizes a fixed size stack of Thompson Sampling bandits. We empirically evaluate POSTS in four large benchmark problems and compare its performance with different tree-based approaches. We show that POSTS achieves competitive performance compared to tree-based open-loop planning and offers a performance memory tradeoff, making it suitable for partially observable planning with highly restricted computational and memory resources.",
url = "https://ojs.aaai.org/index.php/AAAI/article/view/4794",
eprint = "https://thomyphan.github.io/files/2019-aaai.pdf",
journal = "Proceedings of the AAAI Conference on Artificial Intelligence",
pages = "7941--7948",
doi = "https://doi.org/10.1609/aaai.v33i01.33017941"
}
Related Articles
- T. Phan et al., “Counterfactual Online Learning for Open-Loop Monte-Carlo Planning”, AAAI 2025
- T. Phan et al., “Adaptive Anytime Multi-Agent Path Finding using Bandit-Based Large Neighborhood Search”, AAAI 2024
- T. Phan et al., “A Distributed Policy Iteration Scheme for Cooperative Multi-Agent Policy Approximation”, ALA 2020
- T. Phan et al., “Distributed Policy Iteration for Scalable Approximation of Cooperative Multi-Agent Policies”, AAMAS 2019
- T. Phan et al., “Adaptive Thompson Sampling Stacks for Memory Bounded Open-Loop Planning”, IJCAI 2019
- T. Gabor et al., “Subgoal-Based Temporal Abstraction in Monte-Carlo Tree Search”, IJCAI 2019
- T. Gabor et al., “Preparing for the Unexpected: Diversity Improves Planning Resilience in Evolutionary Algorithms”, ICAC 2018
- T. Phan, “Emergence and Resilience in Multi-Agent Reinforcement Learning”, PhD Thesis
Relevant Research Areas