compute_gae

Function compute_gae 

Source
pub fn compute_gae(
    rewards: &[f64],
    values: &[f64],
    dones: &[f64],
    last_value: f64,
    gamma: f64,
    gae_lambda: f64,
) -> (Vec<f64>, Vec<f64>)
Expand description

Compute Generalized Advantage Estimation.

Iterates backwards over the rollout, computing: delta_t = reward_t + gamma * V(t+1) * (1 - done_t) - V(t) A_t = delta_t + gamma * lambda * (1 - done_t) * A(t+1) return_t = A_t + V(t)

The dones slice uses f64 where 0.0 = not done, 1.0 = done, matching the common Python/numpy convention.

ยงPanics

Panics in debug builds if rewards, values, and dones have different lengths.