Trait StochasticPolicy

Source

pub trait StochasticPolicy {
    // Required methods
    fn sample_actions(
        &self,
        obs: &TensorData,
    ) -> Result<(TensorData, TensorData), NNError>;
    fn deterministic_action(
        &self,
        obs: &TensorData,
    ) -> Result<TensorData, NNError>;
    fn learning_rate(&self) -> f32;
    fn set_learning_rate(&mut self, lr: f32);
    fn save(&self, path: &Path) -> Result<(), NNError>;
    fn load(&mut self, path: &Path) -> Result<(), NNError>;
}

Expand description

Continuous stochastic policy for SAC.

Training steps (sac_actor_step) are intentionally NOT on this trait because they require autograd to flow through the critic’s Q-network. Trait methods convert tensors to TensorData (Vec), severing the computation graph. Use the backend-specific inherent sac_actor_step method instead.