AlgorithmicCompetition.AIAPCEnvType
AIAPCEnv(p::AIAPCHyperParameters)

Build an environment to reproduce the results of the 2020  Calvano, Calzolari, Denicolò & Pastorello AER Paper

Calvano, E., Calzolari, G., Denicolò, V., & Pastorello, S. (2020). Artificial Intelligence, Algorithmic Pricing, and Collusion. American Economic Review, 110(10), 3267–3297. https://doi.org/10.1257/aer.20190623
AlgorithmicCompetition.AIAPCHyperParametersType
AIAPCHyperParameters(
    α::Float64,
    β::Float64,
    δ::Float64,
    max_iter::Int,
    competition_solution_dict::Dict{Symbol,CompetitionSolution};
    convergence_threshold::Int = Int(1e5),
)

Hyperparameters which define a specific AIAPC environment.

AlgorithmicCompetition.AIAPCSummaryType
AIAPCSummary(α, β, is_converged, convergence_profit, iterations_until_convergence)

A struct to store the summary of an AIAPC experiment.

AlgorithmicCompetition.ConvergenceCheckType
ConvergenceCheck(convergence_threshold::Int64)

Hook to check convergence, as defined by the best response for each state being stable for a given number of iterations.

AlgorithmicCompetition.DDDCEnvType
DDDCEnv(p::AIAPCHyperParameters)

Build an environment to reproduce the results of the Lewis 2023 extentions to AIAPC.
AlgorithmicCompetition.DDDCHyperParametersType
DDDCHyperParameters(
    α::Float64,
    β::Float64,
    δ::Float64,
    max_iter::Int,
    competition_solution_dict::Dict{Symbol,CompetitionSolution},
    data_demand_digital_params::DataDemandDigitalParams;
    convergence_threshold::Int = Int(1e5),
)

Hyperparameters which define a specific DDDC environment.

AlgorithmicCompetition.DDDCSummaryType
DDDCSummary(α, β, is_converged, data_demand_digital_params, convergence_profit, convergence_profit_demand_high, convergence_profit_demand_low, profit_gain, profit_gain_demand_high, profit_gain_demand_low, iterations_until_convergence, price_response_to_demand_signal_mse, percent_demand_high)

A struct to store the summary of an DDDC experiment.

AlgorithmicCompetition.AIAPCPolicyMethod
AIAPCPolicy(env::AIAPCEnv; mode = "baseline")

Create a policy for the AIAPC environment, with symmetric agents, using a tabular Q-learner. Mode deterimines the initialization of the Q-matrix.

AlgorithmicCompetition.AIAPCStopMethod
AIAPCStop(env::AIAPCEnv; stop_on_convergence = true)

Returns a stop condition that stops when the environment has converged for all players.

AlgorithmicCompetition.DDDCPolicyMethod
DDDCPolicy(env::DDDCEnv; mode = "baseline")

Create a policy for the DDDC environment, with symmetric agents, using a tabular Q-learner. Mode deterimines the initialization of the Q-matrix.

AlgorithmicCompetition.Q_i_0Method
Q_i_0(env::AIAPCEnv)

Calculate the Q-value for player i at time t=0, given the price chosen by player i and assuming random play over the price options of player -i.

AlgorithmicCompetition.Q_i_0Method
Q_i_0(env::DDDCEnv)

Calculate the Q-value for player i at time t=0, given the price chosen by player i and assuming random play over the price options of player -i, weighted by the demand state frequency.

AlgorithmicCompetition.construct_AIAPC_profit_arrayMethod
construct_AIAPC_profit_array(price_options, params, n_players)

Construct a 3-dimensional array which holds the profit for each player given a price pair. The first dimension is player 1's action, the second dimension is player 2's action, and the third dimension is the player index for their profit.

AlgorithmicCompetition.construct_DDDC_profit_arrayMethod
construct_DDDC_profit_array(price_options, params, n_players)

Construct a 3-dimensional array which holds the profit for each player given a price pair. The first dimension is player 1's action, the second dimension is player 2's action, and the third dimension is the player index for their profit.

AlgorithmicCompetition.extract_sim_resultsMethod
extract_sim_results(exp_list::Vector{AIAPCSummary})

Extracts the results of a simulation experiment, given a list of AIAPCSummary objects, returns a DataFrame.

AlgorithmicCompetition.get_convergence_profit_from_hookMethod
get_convergence_profit_from_env(env::DDDCEnv, policy::MultiAgentPolicy)

Returns the average profit of the agent, after convergence, over the convergence state or states (in the case of a cycle). Also returns the average profit for the high and low demand states.

AlgorithmicCompetition.get_optimal_actionMethod
get_optimal_action(env::AIAPCEnv, policy::MultiAgentPolicy, last_observed_state)

Get the optimal action (best response) for each player, given the current policy and the last observed state.

AlgorithmicCompetition.run_aiapcMethod
run_aiapc(
    n_parameter_iterations = 1,
    max_iter = Int(1e9),
    convergence_threshold = Int(1e5),
    max_alpha = 0.25,
    max_beta = 2,
    sample_fraction = 1,
)

Run AIAPC, given a configuration for a set of experiments.

AlgorithmicCompetition.run_dddcMethod
run_dddc(
    n_parameter_iterations = 1,
    max_iter = Int(1e9),
    convergence_threshold = Int(1e5),
    n_grid_increments = 100,
)

Run DDDC, given a configuration for a set of experiments.

ReinforcementLearningBase.act!Method
RLBase.act!(env::AIAPCEnv, price_tuple::Tuple{Int64,Int64})

Act in the environment by setting the memory to the given price tuple and setting is_done to true.

ReinforcementLearningBase.act!Method
RLBase.act!(env::DDDCEnv, price_tuple::Tuple{Int64,Int64})

Act in the environment by setting the memory to the given price tuple and setting is_done to true.

ReinforcementLearningBase.rewardMethod
RLBase.reward(env::AIAPCEnv, p::Int)

Return the reward for the current state for player p as an integer. If the episode is done, return the profit, else return 0.

ReinforcementLearningBase.rewardMethod
RLBase.reward(env::AIAPCEnv, player::Player)

Return the reward for the current state for player. If the episode is done, return the profit, else return 0.

ReinforcementLearningBase.rewardMethod
RLBase.reward(env::AIAPCEnv)

Return the reward for the current state. If the episode is done, return the profit, else return 0, 0.

ReinforcementLearningBase.stateMethod
RLBase.state(env::AIAPCEnv, player::Player)

Return the current state as an integer, mapped from the environment memory.

ReinforcementLearningBase.stateMethod
RLBase.state(env::DDDCEnv, player::Player)

Return the current state as an integer, mapped from the environment memory.