Module reward_shaping

Module reward_shaping 

Source
Expand description

Potential-based reward shaping (PBRS) and goal-distance potentials.

Implements the PBRS transform r' = r + gamma * Phi(s') - Phi(s) which preserves the optimal policy (Ng et al., 1999), plus goal-distance potential computation for goal-conditioned RL.

Structs§

GoalDistanceTransform
Goal-distance reward transform.
PBRSTransform
Potential-based reward shaping transform.
RewardContext
Context passed to reward transforms for access to transition metadata.

Traits§

RewardTransform
Trait for composable reward transformations.

Functions§

compute_goal_distance_potentials
Goal-distance potential: Phi(s) = -scale * ||s[goal_slice] - goal||_2
shape_rewards_pbrs
Compute shaped rewards: r' = r + gamma * Phi(s') - Phi(s)