pub fn shape_rewards_pbrs(
rewards: &[f64],
potentials_current: &[f64],
potentials_next: &[f64],
gamma: f64,
dones: &[f64],
) -> Result<Vec<f64>, RloxError>Expand description
Compute shaped rewards: r' = r + gamma * Phi(s') - Phi(s)
At episode boundaries (dones[i] == 1.0), the potential difference
is zeroed out: r'_i = r_i (no shaping across episode boundaries).
ยงArguments
rewards- raw rewards, length Npotentials_current- Phi(s_t), length Npotentials_next- Phi(s_{t+1}), length Ngamma- discount factordones- episode termination flags (1.0 = done), length N