Variables
float	DECAY_RATE = 0.99

	env = DPendulum()

list	h_rwd = []

float	LEARNING_RATE = 0.85

int	NEPISODES = 500

int	NSTEPS = 50

	NU = env.nu

	NX = env.nx

	Q = np.zeros([env.nx, env.nu])

float	Qref = reward + DECAY_RATE * np.max(Q[x2, :])

	RANDOM_SEED = int((time.time() % 10) * 1000)

	reward

float	rsum = 0.0

	u

	x = env.reset()

	x2

Detailed Description

Example of Q-table learning with a simple discretized 1-pendulum environment.

Function Documentation

def qtable.rendertrial ( maxiter = 100 )

Roll-out from random state using greedy policy.

Definition at line 31 of file qtable.py.

u

Initial value:

 =  np.argmax(
             Q[x, :] + np.random.randn(1, NU) / episode
         )

Definition at line 52 of file qtable.py.