Function shape_rewards_pbrs

Source

pub fn shape_rewards_pbrs(
    rewards: &[f64],
    potentials_current: &[f64],
    potentials_next: &[f64],
    gamma: f64,
    dones: &[f64],
) -> Result<Vec<f64>, RloxError>

Expand description

Compute shaped rewards: r' = r + gamma * Phi(s') - Phi(s)

At episode boundaries (dones[i] == 1.0), the potential difference is zeroed out: r'_i = r_i (no shaping across episode boundaries).

§Arguments

rewards - raw rewards, length N
potentials_current - Phi(s_t), length N
potentials_next - Phi(s_{t+1}), length N
gamma - discount factor
dones - episode termination flags (1.0 = done), length N

shape_rewards_pbrs

Function shape_rewards_pbrs Copy item path

§Arguments

Function shape_rewards_pbrs