Docstrings · CombinatorialBandits.jl

CombinatorialBandits.CombinatorialInstance — Type

CombinatorialInstance{T}

A combinatorial instance, also called an environment in reinforcement learning. An instance is the combination of a combinatorial structure (i.e. what the bandit is allowed to make in the environment) and of the statistical properties of the arms (i.e. what reward the bandit gets).

Two properties are mandatory:

n_arms: the number of arms that are available in the instance (i.e. the dimension of the problem)
optimal_average_reward: the best reward that can be obtained at any round, in expectation

CombinatorialBandits.State — Type

State{T}

Memorise the (global) evolution of the bandit in the environment. This structure mostly contains information required by most bandit algorithms to decide their next step.

round: the number of rounds (time steps) the bandit has already played
regret: the total regret the bandit faced (i.e. how much more reward it would have got if it always played the optimum solution)
reward: the total reward the bandit managed to gather
arm_counts: the number of times each arm (and not each solution, which is a set of arms) has been played. This information is used by most bandit algorithms
arm_reward: the total reward obtained by this arm
arm_average_reward: the average reward obtained by this arm. This information is used by most bandit algorithms
policy_extension: if a policy needs more information than this structure contains, it can store, in a key that uniquely identifies the policy, anything it might require

CombinatorialBandits.Trace — Type

Trace{T}

Memorise all the details about the evolution of the bandit in the environment. This object can therefore become very heavy for long episodes.

The information is stored in several vectors, all of them having the same length equal to the number of rounds the bandit was used:

states: a list of states the bandit entered into over the episode
arms: a list of sets of arms that have been played over the episode
reward: a list of reward the policy received over the episode
policy_details: a list of policy-defined details over the episode (typically, computation times)
time_choose_action: a list of times (in milliseconds) to take a round decision over the episode

Base.push! — Method

push!(trace::Trace{T}, state::State{T}, arms::Vector{T}, reward::Vector{Float64}, policy_details::PolicyDetails,
      time_choose_action::Int) where T

Appends the arguments to the execution trace of the bandit algorithm. More specifically, trace's data structures are updated to also include state, arms, reward, policy_details, and time_choose_action (expressed in milliseconds).

All of these arguments are copied, exceptpolicy_details. (Indeed, the usual scenario is to keep updating the state, the chosen solution (set of arms), and the reward objects, but to build the detail object at each round from the ground up.)

CombinatorialBandits.Kombinator.dimension — Method

dimension(instance::CombinatorialInstance{T}) where T

Returns the dimension of a solution.

CombinatorialBandits.all_arm_indices — Function

all_arm_indices(reward)

Returns a list of arm indices for the given reward. For instance, for a vector of arms, it returns the list of indices in that vector: each index is associated to an arm.

all_arm_indices(instance::CombinatorialInstance{T}) where T

Returns a list of arm indices for the given combinatorial instance.

CombinatorialBandits.initial_state — Method

initial_state(instance::CombinatorialInstance{Int})

Returns a new empty State object for the given combinatorial instance.

CombinatorialBandits.initial_trace — Method

initial_state(instance::CombinatorialInstance{Int})

Returns a new empty Trace object for the given combinatorial instance.

CombinatorialBandits.is_feasible — Method

is_feasible(instance::CombinatorialInstance{T}, arms::Vector{T})

Returns whether the set of arms is a solution that can be played for the given combinatorial instance.

CombinatorialBandits.is_partially_acceptable — Method

is_partially_acceptable(instance::CombinatorialInstance{T}, arms::Vector{T})

Returns whether the set of arms is either a solution or the subset of a solution that can be played for the given combinatorial instance. In some cases, a subset of a solution is not a solution: the distinction between the two states is made by calling is_feasible.

Typically, is_partially_acceptable returns true for an empty set of arms: even if this is not an acceptable solution, adding elements to the empty set may yield a perfectly acceptable solution.

CombinatorialBandits.optimise_linear_sqrtlinear — Method

function optimiselinearsqrtlinear(instance::CombinatorialInstance{T}, algo::ESCB2OptimisationAlgorithm, linear::Dict{T, Float64}, sqrtlinear::Dict{T, Float64}, sqrtlinearweight::Float64, banditround::Int; with_trace::Bool=false) where T

Optimise ESCB2's objective function, with a linear term (with coefficients in linear) and a square root (with coefficients in sqrtlinear):

max linear^T x + sqrtlinear_weight * sqrt(sqrtlinear^T x) s.t. x belongs to the combinatorial set defined by instance

Several implementations are provided, and can be chosen with the appropriate subtype of ESCB2OptimisationAlgorithm.

Returns one (approximately) optimum solution. The exact guarantees depend on the chosen algorithm. If with_trace=true, a second value is returned with comprehensive details of the behaviour of the algorithm.

CombinatorialBandits.pull — Method

pull(instance::CombinatorialInstance{T}, arms::Vector{T}) where T

For the given bandit, plays the given solution (i.e. set of arms). This function returns both the reward and the regret this action caused (zero if the action is optimum, greater than zero otherwise).