Module f64_ops

Module f64_ops 

Source

Functionsยง

compute_batch_group_advantages
Batched GRPO group advantages: process all groups in a single call.
compute_batch_token_kl
Batched token-level KL divergence: process all sequences in a single call.
compute_batch_token_kl_schulman
Batched token-level KL divergence using the Schulman (2020) estimator.
compute_group_advantages
GRPO group advantage: (reward - mean) / std. Returns zeros if std < 1e-8.
compute_token_kl
Token-level KL divergence: sum(exp(log_p) * (log_p - log_q)).
compute_token_kl_schulman
Token-level KL divergence using the Schulman (2020) estimator: sum(exp(log_p - log_q) - (log_p - log_q) - 1).