ReinforcementLearning.jl, as the name says, is a package for reinforcement learning research in Julia.
Our design principles are:
- Reusability and extensibility: Provide elaborately designed components and interfaces to help users implement new algorithms.
- Easy experimentation: Make it easy for new users to run benchmark experiments, compare different algorithms, evaluate and diagnose agents.
- Reproducibility: Facilitate reproducibility from traditional tabular methods to modern deep reinforcement learning algorithms.
Get Started
julia> ] add ReinforcementLearning
julia> using ReinforcementLearning
julia> run(E`JuliaRL_BasicDQN_CartPole`)
Click to show the runtime result!
JuliaRL_BasicDQN_CartPole
This experiment uses three dense layers to approximate the Q value. The testing environment is CartPoleEnv.
Agent and statistic info will be saved to: /home/runner/work/JuliaReinforcementLearning.github.io/JuliaReinforcementLearning.github.io/checkpoints/JuliaRL_BasicDQN_CartPole_2020_08_06_14_51_55
You can also view the tensorboard logs with tensorboard --logdir /home/runner/work/JuliaReinforcementLearning.github.io/JuliaReinforcementLearning.github.io/checkpoints/JuliaRL_BasicDQN_CartPole_2020_08_06_14_51_55/tb_log
To load the agent and statistic info:
agent = RLCore.load("/home/runner/work/JuliaReinforcementLearning.github.io/JuliaReinforcementLearning.github.io/checkpoints/JuliaRL_BasicDQN_CartPole_2020_08_06_14_51_55", Agent)
BSON.@load joinpath("/home/runner/work/JuliaReinforcementLearning.github.io/JuliaReinforcementLearning.github.io/checkpoints/JuliaRL_BasicDQN_CartPole_2020_08_06_14_51_55", "stats.bson") total_reward_per_episode time_per_step
ReinforcementLearningCore.Experiment
├─ agent => ReinforcementLearningCore.Agent
│ ├─ policy => ReinforcementLearningCore.QBasedPolicy
│ │ ├─ learner => ReinforcementLearningZoo.BasicDQNLearner
│ │ │ ├─ approximator => ReinforcementLearningCore.NeuralNetworkApproximator
│ │ │ │ ├─ model => Flux.Chain
│ │ │ │ │ └─ layers
│ │ │ │ │ ├─ 1
│ │ │ │ │ │ └─ Flux.Dense
│ │ │ │ │ │ ├─ W => 128×4 Array{Float32,2}
│ │ │ │ │ │ ├─ b => 128-element Array{Float32,1}
│ │ │ │ │ │ └─ σ => typeof(NNlib.relu)
│ │ │ │ │ ├─ 2
│ │ │ │ │ │ └─ Flux.Dense
│ │ │ │ │ │ ├─ W => 128×128 Array{Float32,2}
│ │ │ │ │ │ ├─ b => 128-element Array{Float32,1}
│ │ │ │ │ │ └─ σ => typeof(NNlib.relu)
│ │ │ │ │ └─ 3
│ │ │ │ │ └─ Flux.Dense
│ │ │ │ │ ├─ W => 2×128 Array{Float32,2}
│ │ │ │ │ ├─ b => 2-element Array{Float32,1}
│ │ │ │ │ └─ σ => typeof(identity)
│ │ │ │ └─ optimizer => Flux.Optimise.ADAM
│ │ │ │ ├─ eta => 0.001
│ │ │ │ ├─ beta
│ │ │ │ │ ├─ 1
│ │ │ │ │ │ └─ 0.9
│ │ │ │ │ └─ 2
│ │ │ │ │ └─ 0.999
│ │ │ │ └─ state => IdDict
│ │ │ │ ├─ ht => 32-element Array{Any,1}
│ │ │ │ ├─ count => 0
│ │ │ │ └─ ndel => 0
│ │ │ ├─ loss_func => typeof(ReinforcementLearningCore.huber_loss)
│ │ │ ├─ γ => 0.99
│ │ │ ├─ batch_size => 32
│ │ │ ├─ min_replay_history => 100
│ │ │ ├─ rng => Random.MersenneTwister
│ │ │ └─ loss => 0.0
│ │ └─ explorer => ReinforcementLearningCore.EpsilonGreedyExplorer
│ │ ├─ ϵ_stable => 0.01
│ │ ├─ ϵ_init => 1.0
│ │ ├─ warmup_steps => 0
│ │ ├─ decay_steps => 500
│ │ ├─ step => 1
│ │ ├─ rng => Random.MersenneTwister
│ │ └─ is_training => true
│ ├─ trajectory => 0-element ReinforcementLearningCore.Trajectory
│ │ ├─ state => 4×0 view(::ReinforcementLearningCore.CircularArrayBuffer{Float32,2}, :, 1:0) with eltype Float32
│ │ ├─ action => 0-element view(::ReinforcementLearningCore.CircularArrayBuffer{Int64,1}, 1:0) with eltype Int64
│ │ ├─ reward => 0-element ReinforcementLearningCore.CircularArrayBuffer{Float32,1}
│ │ ├─ terminal => 0-element ReinforcementLearningCore.CircularArrayBuffer{Bool,1}
│ │ ├─ next_state => 4×0 view(::ReinforcementLearningCore.CircularArrayBuffer{Float32,2}, :, 2:1) with eltype Float32
│ │ └─ next_action => 0-element view(::ReinforcementLearningCore.CircularArrayBuffer{Int64,1}, 2:1) with eltype Int64
│ ├─ role => DEFAULT_PLAYER
│ └─ is_training => true
├─ env => ReinforcementLearningEnvironments.CartPoleEnv: ReinforcementLearningBase.SingleAgent(),ReinforcementLearningBase.Sequential(),ReinforcementLearningBase.PerfectInformation(),ReinforcementLearningBase.Deterministic(),ReinforcementLearningBase.StepReward(),ReinforcementLearningBase.GeneralSum(),ReinforcementLearningBase.MinimalActionSet()
├─ stop_condition => ReinforcementLearningCore.StopAfterStep
│ ├─ step => 10000
│ ├─ cur => 1
│ └─ progress => ProgressMeter.Progress
├─ hook => ReinforcementLearningCore.ComposedHook
│ └─ hooks
│ ├─ 1
│ │ └─ ReinforcementLearningCore.TotalRewardPerEpisode
│ │ ├─ rewards => 0-element Array{Float64,1}
│ │ └─ reward => 0.0
│ ├─ 2
│ │ └─ ReinforcementLearningCore.TimePerStep
│ │ ├─ times => 0-element ReinforcementLearningCore.CircularArrayBuffer{Float64,1}
│ │ └─ t => 791027816977
│ ├─ 3
│ │ └─ ReinforcementLearningCore.DoEveryNStep
│ │ ├─ f => ReinforcementLearningZoo.var"#106#111"
│ │ ├─ n => 1
│ │ └─ t => 0
│ ├─ 4
│ │ └─ ReinforcementLearningCore.DoEveryNEpisode
│ │ ├─ f => ReinforcementLearningZoo.var"#108#113"
│ │ ├─ n => 1
│ │ └─ t => 0
│ └─ 5
│ └─ ReinforcementLearningCore.DoEveryNStep
│ ├─ f => ReinforcementLearningZoo.var"#110#115"
│ ├─ n => 10000
│ └─ t => 0
└─ description => "This experiment uses three dense layers to approximate the Q value...."
Check out the get started page for more detailed explanation!
Project Structure
ReinforcementLearning.jl
itself is just a wrapper around several other packages inside the JuliaReinforcementLearning org. The relationship between different packages is described below:
+-------------------------------------------------------------------------------------------+ | | | ReinforcementLearning.jl | | | | +------------------------------+ | | | ReinforcementLearningBase.jl | | | +--------|---------------------+ | | | | | | +--------------------------------------+ | | | | ReinforcementLearningEnvironments.jl | | | | | | | | | | (Conditionally depends on) | | | | | | | | | | ArcadeLearningEnvironment.jl | | | +-------->+ OpenSpiel.jl | | | | | POMDPs.jl | | | | | PyCall.jl | | | | | ViZDoom.jl | | | | | GridWorlds.jl | | | | +--------------------------------------+ | | | | | | +------------------------------+ | | +-------->+ ReinforcementLearningCore.jl | | | +--------|---------------------+ | | | | | | +-----------------------------+ | | |--------->+ ReinforcementLearningZoo.jl | | | | +-----------------------------+ | | | | | | +----------------------------------------+ | | +--------->+ ReinforcementLearningAnIntroduction.jl | | | +----------------------------------------+ | +-------------------------------------------------------------------------------------------+