Uncertainty-Based Out-of-Distribution Detection in Deep Reinforcement Learning
Andreas Sedlmeier and Thomas Gabor and Thomy Phan and Lenz Belzner and Claudia Linnhoff-Popien.1st International Symposium on Applied Artificial Intelligence (ISAAI), pages 74--78, 2019.
[abstract] [bibtex] [doi] [pdf]
We consider the problem of detecting out-of-distribution (OOD) samples in deep reinforcement learning. In a value based reinforcement learning setting, we propose to use uncertainty estimation techniques directly on the agent's value estimating neural network to detect OOD samples. The focus of our work lies in analyzing the suitability of approximate Bayesian inference methods and related ensembling techniques that generate uncertainty estimates. Although prior work has shown that dropout-based variational inference techniques and bootstrap-based approaches can be used to model epistemic uncertainty, the suitability for detecting OOD samples in deep reinforcement learning remains an open question. Our results show that uncertainty estimation can be used to differentiate in- from out-of-distribution samples. Over the complete training process of the reinforcement learning agents, bootstrap-based approaches tend to produce more reliable epistemic uncertainty estimates, when compared to dropout-based approaches.
@inproceedings{ sedlmeierISAAI19, author = "Andreas Sedlmeier and Thomas Gabor and Thomy Phan and Lenz Belzner and Claudia Linnhoff-Popien", title = "Uncertainty-Based Out-of-Distribution Detection in Deep Reinforcement Learning", year = "2019", abstract = "We consider the problem of detecting out-of-distribution (OOD) samples in deep reinforcement learning. In a value based reinforcement learning setting, we propose to use uncertainty estimation techniques directly on the agent's value estimating neural network to detect OOD samples. The focus of our work lies in analyzing the suitability of approximate Bayesian inference methods and related ensembling techniques that generate uncertainty estimates. Although prior work has shown that dropout-based variational inference techniques and bootstrap-based approaches can be used to model epistemic uncertainty, the suitability for detecting OOD samples in deep reinforcement learning remains an open question. Our results show that uncertainty estimation can be used to differentiate in- from out-of-distribution samples. Over the complete training process of the reinforcement learning agents, bootstrap-based approaches tend to produce more reliable epistemic uncertainty estimates, when compared to dropout-based approaches.", url = "https://link.springer.com/article/10.1007/s42354-019-0238-z", eprint = "https://thomyphan.github.io/files/2019-isaai-preprint.pdf", location = "Munich, Germany", publisher = "Digitale Welt", booktitle = "1st International Symposium on Applied Artificial Intelligence", pages = "74--78", doi = "https://doi.org/10.1007/s42354-019-0238-z" }
Related Articles
- R. Müller et al., “Towards Anomaly Detection in Reinforcement Learning”, AAMAS BlueSky Ideas 2022
- A. Sedlmeier et al., “Uncertainty-Based Out-of-Distribution Classification in Deep Reinforcement Learning”, ICAART 2020