rlox logo

rlox

Rust-accelerated reinforcement learning.
3-50x faster than SB3. Zero-copy PyO3 bindings.

180K
SPS with Candle hybrid
147x
faster GAE vs NumPy
18
algorithms
7
Rust crates

Quick Start

# Install
pip install rlox

# Train PPO on CartPole (3 lines)
from rlox.trainers import PPOTrainer
metrics = PPOTrainer(env="CartPole-v1").train(50_000)
print(f"Reward: {metrics['mean_reward']:.0f}")

# CLI (no Python needed)
python -m rlox train --algo ppo --env CartPole-v1 --timesteps 100000

Why rlox?

vs SB3vs CleanRLvs RLlibvs TRL
Speed 3-50x faster data plane 2x faster collection Same (single-node) 4-14x faster KL/GRPO
Scope Same algorithms + offline RL More algorithms Lightweight, no Ray Complementary
Unique Rust VecEnv + Candle Offline RL + LLM ops Rust buffers + GAE Rust sequence packing

Algorithms

On-Policy

  • PPO
  • A2C
  • IMPALA
  • MAPPO

Off-Policy

  • SAC
  • TD3
  • DQN (Rainbow)

Offline RL

  • TD3+BC
  • IQL
  • CQL
  • BC

LLM Post-Training

  • GRPO
  • DPO
  • OnlineDPO
  • BestOfN

Model-Based

  • DreamerV3

Hybrid

  • HybridPPO (Candle)

Installation

# From PyPI
pip install rlox

# From source (requires Rust toolchain)
git clone https://github.com/riserally/rlox.git
cd rlox && pip install -e ".[dev]"

Getting Started

Installation, first training run, step-by-step tutorial.

Python Guide

Complete user guide with 3-level API reference.

Examples

Copy-paste code for every algorithm.

Rust API

Auto-generated crate documentation.

Tutorials

Custom networks, exploration, offline RL, Candle hybrid.

Blog

Benchmarks, deep-dives, and release notes.