A Smoothing Framework for Stochastic Continuous-time Reinforcement Learning Problem
Author | : Bowen Hu |
Publisher | : |
Total Pages | : 0 |
Release | : 2021 |
Genre | : |
ISBN | : |
Download A Smoothing Framework for Stochastic Continuous-time Reinforcement Learning Problem Book in PDF, Epub and Kindle
Reinforcement learning problem embraces many breakthroughs in stochastic discrete-time and deterministic continuous-time systems. Stochastic continuous-time reinforcement learning is an important yet under studied area. In this dissertation, I present a framework to adapt deterministic continuous time temporal difference learning method to stochastic continuous time systems. I first review the temporal difference methods of discrete time and deterministic continuous time. Then I discuss a popular method that solves optimal control problem and verify its accuracy with Merton's problem. Motivated by the fact that the stochastic system and corresponding deterministic system can be as close as possible as the variance term decreases to zero, I introduce a new nonparametric smoothing method that generalizes deterministic continuous time method to stochastic problem by shrinking the variance term of the stochastic process. I demonstrate that the smoothing method outperforms traditional deterministic continuous time temporal difference method in our numerical study of the stochastic pendulum. In the end, I provide the proof of the convergence of the solution of the proposed framework to a corresponding deterministic continuous time solution. If the optimal value function and optimal policy can be obtained by traditional deterministic algorithms, then applying kernel smoothing framework with continuous TD guarantees convergence to the optimal value or policy for stochastic process.