Training Reinforcement Neurocontrollers Using the Polytope Algorithm

12/03/1998
by   A. Likas, et al.
0

A new training algorithm is presented for delayed reinforcement learning problems that does not assume the existence of a critic model and employs the polytope optimization algorithm to adjust the weights of the action network so that a simple direct measure of the training performance is maximized. Experimental results from the application of the method to the pole balancing problem indicate improved training performance compared with critic-based and genetic reinforcement approaches.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset