Expand description
Potential-based reward shaping (PBRS) and goal-distance potentials.
Implements the PBRS transform r' = r + gamma * Phi(s') - Phi(s) which
preserves the optimal policy (Ng et al., 1999), plus goal-distance
potential computation for goal-conditioned RL.
Structs§
- Goal
Distance Transform - Goal-distance reward transform.
- PBRS
Transform - Potential-based reward shaping transform.
- Reward
Context - Context passed to reward transforms for access to transition metadata.
Traits§
- Reward
Transform - Trait for composable reward transformations.
Functions§
- compute_
goal_ distance_ potentials - Goal-distance potential:
Phi(s) = -scale * ||s[goal_slice] - goal||_2 - shape_
rewards_ pbrs - Compute shaped rewards:
r' = r + gamma * Phi(s') - Phi(s)