登入選單
返回Google圖書搜尋
Biased Exploration in Offline Hierarchical Reinforcement Learning
註釋A way of giving prior knowledge to a reinforcement learning agent is through a task hierarchy. When collecting data for offline learning with a task hierarchy, the structure of the hierarchy determines the distribution of data. In some cases, the hierarchy structure causes the data distribution to be skewed so that learning an effective policy from the collected data requires many samples. In this thesis, we address this problem. First, we determine the conditions when the hierarchy structure will cause some actions to be sampled with low probability, and describe when this sampling distribution will delay convergence. Second, we present three biased sampling algorithms to address the problem. These algorithms employ the novel strategy of exploring a different hierarchical MDP than the one in which the policy is to be learned. Exploring in these new MDPs improves the sampling distribution and the rate of convergence of the learned policy to optimal in the original MDP. Finally, we evaluate all of our methods and several baselines on several different reinforcement learning problems. Our experiments show that our methods outperform the baselines, often significantly, when the hierarchy has a problematic structure. Furthermore, they identify trade-offs between the proposed methods and suggest scenarios when each method should be used.