pinocchio  3.3.1
A fast and flexible implementation of Rigid Body Dynamics algorithms and their analytical derivatives
qnet Namespace Reference

Classes

class  QValueNetwork
 — Q-value networks More...
 

Functions

def disturb (u, i)
 
def onehot (ix, n=NX)
 
def rendertrial (maxiter=100)
 

Variables

float DECAY_RATE = 0.99
 
 env = DPendulum()
 — Environment
 
 feed_dict
 
list h_rwd = []
 — History of search
 
float LEARNING_RATE = 0.1
 
int NEPISODES = 500
 — Hyper paramaters
 
int NSTEPS = 50
 
 NU = env.nu
 
 NX = env.nx
 
 optim
 
 Q2 = sess.run(qvalue.qvalue, feed_dict={qvalue.x: onehot(x2)})
 
 Qref = sess.run(qvalue.qvalue, feed_dict={qvalue.x: onehot(x)})
 
 qvalue = QValueNetwork()
 
 RANDOM_SEED = int((time.time() % 10) * 1000)
 — Random seed
 
 reward
 
float rsum = 0.0
 
 sess = tf.InteractiveSession()
 
 u = sess.run(qvalue.u, feed_dict={qvalue.x: onehot(x)})[0]
 
 x = env.reset()
 — Training
 
 x2
 

Detailed Description

Example of Q-table learning with a simple discretized 1-pendulum environment using a
linear Q network.

Function Documentation

◆ onehot()

def qnet.onehot (   ix,
  n = NX 
)
Return a vector which is 0 everywhere except index <i> set to 1.

Definition at line 58 of file qnet.py.