Model-Based Reinforcement Learning with Continuous States and Actions
2008
Conference Paper
ei
Finding an optimal policy in a reinforcement learning (RL) framework with continuous state and action spaces is challenging. Approximate solutions are often inevitable. GPDP is an approximate dynamic programming algorithm based on Gaussian process (GP) models for the value functions. In this paper, we extend GPDP to the case of unknown transition dynamics. After building a GP model for the transition dynamics, we apply GPDP to this model and determine a continuous-valued policy in the entire state space. We apply the resulting controller to the underpowered pendulum swing up. Moreover, we compare our results on this RL task to a nearly optimal discrete DP solution in a fully known environment.
Author(s): | Deisenroth, MP. and Rasmussen, CE. and Peters, J. |
Book Title: | ESANN 2008 |
Journal: | Advances in Computational Intelligence and Learning: Proceedings of the European Symposium on Artificial Neural Networks (ESANN 2008) |
Pages: | 19-24 |
Year: | 2008 |
Month: | April |
Day: | 0 |
Editors: | Verleysen, M. |
Publisher: | d-side |
Department(s): | Empirical Inference |
Bibtex Type: | Conference Paper (inproceedings) |
Event Name: | European Symposium on Artificial Neural Networks |
Event Place: | Bruges, Belgium |
Address: | Evere, Belgium |
Digital: | 0 |
Language: | en |
Organization: | Max-Planck-Gesellschaft |
School: | Biologische Kybernetik |
Links: |
PDF
Web |
BibTex @inproceedings{4977, title = {Model-Based Reinforcement Learning with Continuous States and Actions}, author = {Deisenroth, MP. and Rasmussen, CE. and Peters, J.}, journal = {Advances in Computational Intelligence and Learning: Proceedings of the European Symposium on Artificial Neural Networks (ESANN 2008)}, booktitle = {ESANN 2008}, pages = {19-24}, editors = {Verleysen, M. }, publisher = {d-side}, organization = {Max-Planck-Gesellschaft}, school = {Biologische Kybernetik}, address = {Evere, Belgium}, month = apr, year = {2008}, doi = {}, month_numeric = {4} } |