pub fn compute_gae(
rewards: &[f64],
values: &[f64],
dones: &[f64],
last_value: f64,
gamma: f64,
gae_lambda: f64,
) -> (Vec<f64>, Vec<f64>)Expand description
Compute Generalized Advantage Estimation.
Iterates backwards over the rollout, computing: delta_t = reward_t + gamma * V(t+1) * (1 - done_t) - V(t) A_t = delta_t + gamma * lambda * (1 - done_t) * A(t+1) return_t = A_t + V(t)
The dones slice uses f64 where 0.0 = not done, 1.0 = done,
matching the common Python/numpy convention.
ยงPanics
Panics in debug builds if rewards, values, and dones have
different lengths.