shape_rewards_pbrs

Function shape_rewards_pbrs 

Source
pub fn shape_rewards_pbrs(
    rewards: &[f64],
    potentials_current: &[f64],
    potentials_next: &[f64],
    gamma: f64,
    dones: &[f64],
) -> Result<Vec<f64>, RloxError>
Expand description

Compute shaped rewards: r' = r + gamma * Phi(s') - Phi(s)

At episode boundaries (dones[i] == 1.0), the potential difference is zeroed out: r'_i = r_i (no shaping across episode boundaries).

ยงArguments

  • rewards - raw rewards, length N
  • potentials_current - Phi(s_t), length N
  • potentials_next - Phi(s_{t+1}), length N
  • gamma - discount factor
  • dones - episode termination flags (1.0 = done), length N