rlox

Rust-accelerated reinforcement learning.
3-50x faster than SB3. Zero-copy PyO3 bindings.

180K

SPS with Candle hybrid

147x

faster GAE vs NumPy

algorithms

Rust crates

Quick Start

# Install
pip install rlox

# Train PPO on CartPole (3 lines)
from rlox.trainers import PPOTrainer
metrics = PPOTrainer(env="CartPole-v1").train(50_000)
print(f"Reward: {metrics['mean_reward']:.0f}")

# CLI (no Python needed)
python -m rlox train --algo ppo --env CartPole-v1 --timesteps 100000

Why rlox?

	vs SB3	vs CleanRL	vs RLlib	vs TRL
Speed	3-50x faster data plane	2x faster collection	Same (single-node)	4-14x faster KL/GRPO
Scope	Same algorithms + offline RL	More algorithms	Lightweight, no Ray	Complementary
Unique	Rust VecEnv + Candle	Offline RL + LLM ops	Rust buffers + GAE	Rust sequence packing

Algorithms

On-Policy

PPO
A2C
IMPALA
MAPPO

Off-Policy

SAC
TD3
DQN (Rainbow)

Offline RL

TD3+BC
IQL
CQL
BC

LLM Post-Training

GRPO
DPO
OnlineDPO
BestOfN

Model-Based

DreamerV3

Hybrid

HybridPPO (Candle)

Installation

# From PyPI
pip install rlox

# From source (requires Rust toolchain)
git clone https://github.com/riserally/rlox.git
cd rlox && pip install -e ".[dev]"

Getting Started

Installation, first training run, step-by-step tutorial.

Python Guide

Complete user guide with 3-level API reference.

Examples

Copy-paste code for every algorithm.

Rust API

Auto-generated crate documentation.

Tutorials

Custom networks, exploration, offline RL, Candle hybrid.

Blog

Benchmarks, deep-dives, and release notes.