登入選單
返回Google圖書搜尋
On Planning and Exploration in Non-discrete Environments
註釋Abstract: "The application of reinforcement learning to control problems has received considerable attention in the last few years [And86, Bar89, Sut84]. In general there are two principles to solve reinforcement learning problems: direct and indirect techniques, both having their advantages and disadvantages. We present a system that combines both methods [TML91, TML90]. By interaction with an unknown environment a world model is progressively constructed using the backpropagation algorithm. For optimizing actions with respect to future reinforcement planning is applied in two steps: An experience network proposes a plan which is subsequently optimized by gradient descent with a chain of model networks.