pinocchio  3.3.1
A fast and flexible implementation of Rigid Body Dynamics algorithms and their analytical derivatives
continuous Namespace Reference

Classes

class  PolicyNetwork
 
class  QValueNetwork
 — Q-value and policy networks More...
 
class  ReplayItem
 

Functions

def rendertrial (maxiter=NSTEPS, verbose=True)
 

Variables

 batch
 
int BATCH_SIZE = 64
 
 d_batch = np.vstack([b.done for b in batch])
 
float DECAY_RATE = 0.99
 
bool done = False
 
 env = Pendulum(1)
 — Environment
 
 feed_dict
 
list h_qva = []
 
list h_rwd = []
 History of search.
 
list h_ste = []
 
tuple maxq
 
 n_init = tflearn.initializations.truncated_normal(seed=RANDOM_SEED)
 
int NEPISODES = 100
 — Hyper paramaters
 
int NH1 = 250
 
int NSTEPS = 100
 
 NU = env.nu
 
 NX = env.nobs
 
 optim
 
 policy = PolicyNetwork().setupOptim()
 — Tensor flow initialization
 
float POLICY_LEARNING_RATE = 0.0001
 
 policyTarget = PolicyNetwork().setupTargetAssign(policy)
 
 q2_batch
 
 qgrad
 
 qref_batch = r_batch + (not d_batch) * (DECAY_RATE * q2_batch)
 
 qvalue = QValueNetwork().setupOptim()
 
float QVALUE_LEARNING_RATE = 0.001
 
 qvalueTarget = QValueNetwork().setupTargetAssign(qvalue)
 
 r
 
 r_batch = np.vstack([b.reward for b in batch])
 
 RANDOM_SEED = int((time.time() % 10) * 1000)
 — Random seed
 
int REPLAY_SIZE = 10000
 
 replayDeque = deque()
 
float rsum = 0.0
 
 sess = tf.InteractiveSession()
 
 u = sess.run(policy.policy, feed_dict={policy.x: x})
 
 u2_batch
 
 u_batch = np.vstack([b.u for b in batch])
 
 u_init = tflearn.initializations.uniform(minval=-0.003, maxval=0.003, seed=RANDOM_SEED)
 
 u_targ = sess.run(policy.policy, feed_dict={policy.x: x_batch})
 
float UPDATE_RATE = 0.01
 
 withSinCos
 
 x = env.reset().T
 — Training
 
 x2 = x2.T
 
 x2_batch = np.vstack([b.x2 for b in batch])
 
 x_batch = np.vstack([b.x for b in batch])
 

Detailed Description

Deep actor-critic network,
From "Continuous control with deep reinforcement learning",
by Lillicrap et al, arXiv:1509.02971

Variable Documentation

◆ batch

batch
Initial value:
1 = random.sample(
2  replayDeque, BATCH_SIZE
3  )

Definition at line 198 of file continuous.py.

◆ maxq

tuple maxq
Initial value:
1 = (
2  np.max(
3  sess.run(qvalue.qvalue, feed_dict={qvalue.x: x_batch, qvalue.u: u_batch})
4  )
5  if "x_batch" in locals()
6  else 0
7  )

Definition at line 244 of file continuous.py.

◆ q2_batch

q2_batch
Initial value:
1 = sess.run(
2  qvalueTarget.qvalue,
3  feed_dict={qvalueTarget.x: x2_batch, qvalueTarget.u: u2_batch},
4  )

Definition at line 211 of file continuous.py.

◆ qgrad

qgrad
Initial value:
1 = sess.run(
2  qvalue.gradient, feed_dict={qvalue.x: x_batch, qvalue.u: u_targ}
3  )

Definition at line 229 of file continuous.py.

◆ u2_batch

u2_batch
Initial value:
1 = sess.run(
2  policyTarget.policy, feed_dict={policyTarget.x: x2_batch}
3  )

Definition at line 208 of file continuous.py.