CausalELM

CausalELM.CausalELM — Module

Macros, functions, and structs for applying Ensembles of extreme learning machines to causal inference tasks where the counterfactual is unavailable or biased and must be predicted. Supports causal inference via interrupted time series designs, parametric G-computation, double machine learning, and S-learning, T-learning, X-learning, R-learning, and doubly robust estimation.

For more details on Extreme Learning Machines see: Huang, Guang-Bin, Qin-Yu Zhu, and Chee-Kheong Siew. "Extreme learning machine: theory and applications." Neurocomputing 70, no. 1-3 (2006): 489-501.

Types

CausalELM.InterruptedTimeSeries — Type

InterruptedTimeSeries(X₀, Y₀, X₁, Y₁; kwargs...)

Initialize an interrupted time series estimator.

Arguments

X₀::Any: array or DataFrame of covariates from the pre-treatment period.
Y₁::Any: array or DataFrame of outcomes from the pre-treatment period.
X₁::Any: array or DataFrame of covariates from the post-treatment period.
Y₁::Any: array or DataFrame of outcomes from the post-treatment period.

Keywords

activation::Function=swish: activation function to use.
sample_size::Integer=size(X₀, 1): number of bootstrapped samples for the extreme learner.
num_machines::Integer=50: number of extreme learning machines for the ensemble.
num_feats::Integer=Int(round(0.75 * size(X₀, 2))): number of features to bootstrap for each learner in the ensemble.
num_neurons::Integer: number of neurons to use in the extreme learning machines.

Notes

To reduce the computational complexity you can reduce samplesize, nummachines, or num_neurons.

References

For a simple linear regression-based tutorial on interrupted time series analysis see: Bernal, James Lopez, Steven Cummins, and Antonio Gasparrini. "Interrupted time series regression for the evaluation of public health interventions: a tutorial." International journal of epidemiology 46, no. 1 (2017): 348-355.

Examples

julia> X₀, Y₀, X₁, Y₁ =  rand(100, 5), rand(100), rand(10, 5), rand(10)
julia> m1 = InterruptedTimeSeries(X₀, Y₀, X₁, Y₁)
julia> m2 = InterruptedTimeSeries(X₀, Y₀, X₁, Y₁; regularized=false)
julia> x₀_df = DataFrame(x1=rand(100), x2=rand(100), x3=rand(100))
julia> y₀_df = DataFrame(y=rand(100))
julia> x₁_df = DataFrame(x1=rand(100), x2=rand(100), x3=rand(100)) 
julia> y₁_df = DataFrame(y=rand(100))
julia> m3 = InterruptedTimeSeries(x₀_df, y₀_df, x₁_df, y₁_df)

CausalELM.GComputation — Type

GComputation(X, T, Y; kwargs...)

Initialize a G-Computation estimator.

Arguments

X::Any: array or DataFrame of covariates.
T::Any: vector or DataFrame of treatment statuses.
Y::Any: array or DataFrame of outcomes.

Keywords

quantity_of_interest::String: ATE for average treatment effect or ATT for average treatment effect on the treated.
activation::Function=swish: activation function to use.
sample_size::Integer=size(X, 1): number of bootstrapped samples for the extreme learners.
num_machines::Integer=50: number of extreme learning machines for the ensemble.
num_feats::Integer=Int(round(0.75 * size(X, 2))): number of features to bootstrap for each learner in the ensemble.
num_neurons::Integer: number of neurons to use in the extreme learning machines.

Notes

To reduce the computational complexity you can reduce samplesize, nummachines, or num_neurons.

References

For a good overview of G-Computation see: Chatton, Arthur, Florent Le Borgne, Clémence Leyrat, Florence Gillaizeau, Chloé Rousseau, Laetitia Barbin, David Laplaud, Maxime Léger, Bruno Giraudeau, and Yohann Foucher. "G-computation, propensity score-based methods, and targeted maximum likelihood estimator for causal inference with different covariates sets: a comparative simulation study." Scientific reports 10, no. 1 (2020): 9219.

Examples

julia> X, T, Y =  rand(100, 5), rand(100), [rand()<0.4 for i in 1:100]
julia> m1 = GComputation(X, T, Y)
julia> m2 = GComputation(X, T, Y; task="regression")
julia> m3 = GComputation(X, T, Y; task="regression", quantity_of_interest="ATE)

julia> x_df = DataFrame(x1=rand(100), x2=rand(100), x3=rand(100), x4=rand(100))
julia> t_df, y_df = DataFrame(t=rand(0:1, 100)), DataFrame(y=rand(100)) 
julia> m5 = GComputation(x_df, t_df, y_df)

CausalELM.DoubleMachineLearning — Type

DoubleMachineLearning(X, T, Y; kwargs...)

Initialize a double machine learning estimator with cross fitting.

Arguments

X::Any: array or DataFrame of covariates of interest.
T::Any: vector or DataFrame of treatment statuses.
Y::Any: array or DataFrame of outcomes.

Keywords

activation::Function=swish: activation function to use.
sample_size::Integer=size(X, 1): number of bootstrapped samples for teh extreme learners.
num_machines::Integer=50: number of extreme learning machines for the ensemble.
num_feats::Integer=Int(round(0.75, * size(X, 2))): number of features to bootstrap for each learner in the ensemble.
num_neurons::Integer: number of neurons to use in the extreme learning machines.
folds::Integer: number of folds to use for cross fitting.

Notes

To reduce the computational complexity you can reduce samplesize, nummachines, or num_neurons.

References

For more information see: Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. "Double/debiased machine learning for treatment and structural parameters." (2016): C1-C68.

Examples

julia> X, T, Y =  rand(100, 5), [rand()<0.4 for i in 1:100], rand(100)
julia> m1 = DoubleMachineLearning(X, T, Y)

julia> x_df = DataFrame(x1=rand(100), x2=rand(100), x3=rand(100), x4=rand(100))
julia> t_df, y_df = DataFrame(t=rand(0:1, 100)), DataFrame(y=rand(100))
julia> m2 = DoubleMachineLearning(x_df, t_df, y_df)

CausalELM.SLearner — Type

SLearner(X, T, Y; kwargs...)

Initialize a S-Learner.

Arguments

X::Any: an array or DataFrame of covariates.
T::Any: an vector or DataFrame of treatment statuses.
Y::Any: an array or DataFrame of outcomes.

Keywords

activation::Function=swish: the activation function to use.
sample_size::Integer=size(X, 1): number of bootstrapped samples for eth extreme learners.
num_machines::Integer=50: number of extreme learning machines for the ensemble.
num_feats::Integer=Int(round(0.75 * size(X, 2))): number of features to bootstrap for each learner in the ensemble.
num_neurons::Integer: number of neurons to use in the extreme learning machines.

Notes

To reduce the computational complexity you can reduce samplesize, nummachines, or num_neurons.

References

For an overview of S-Learners and other metalearners see: Künzel, Sören R., Jasjeet S. Sekhon, Peter J. Bickel, and Bin Yu. "Metalearners for estimating heterogeneous treatment effects using machine learning." Proceedings of the national academy of sciences 116, no. 10 (2019): 4156-4165.

Examples

julia> X, T, Y =  rand(100, 5), [rand()<0.4 for i in 1:100], rand(100)
julia> m1 = SLearner(X, T, Y)
julia> m2 = SLearner(X, T, Y; task="regression")
julia> m3 = SLearner(X, T, Y; task="regression", regularized=true)

julia> x_df = DataFrame(x1=rand(100), x2=rand(100), x3=rand(100), x4=rand(100))
julia> t_df, y_df = DataFrame(t=rand(0:1, 100)), DataFrame(y=rand(100))
julia> m4 = SLearner(x_df, t_df, y_df)

CausalELM.TLearner — Type

TLearner(X, T, Y; kwargs...)

Initialize a T-Learner.

Arguments

X::Any: an array or DataFrame of covariates.
T::Any: an vector or DataFrame of treatment statuses.
Y::Any: an array or DataFrame of outcomes.

Keywords

activation::Function=swish: the activation function to use.
sample_size::Integer=size(X, 1): number of bootstrapped samples for eth extreme learners.
num_machines::Integer=50: number of extreme learning machines for the ensemble.
num_feats::Integer=Int(round(0.75 * size(X, 2))): number of features to bootstrap for each learner in the ensemble.
num_neurons::Integer: number of neurons to use in the extreme learning machines.

Notes

To reduce the computational complexity you can reduce samplesize, nummachines, or num_neurons.

References

For an overview of T-Learners and other metalearners see: Künzel, Sören R., Jasjeet S. Sekhon, Peter J. Bickel, and Bin Yu. "Metalearners for estimating heterogeneous treatment effects using machine learning." Proceedings of the national academy of sciences 116, no. 10 (2019): 4156-4165.

Examples

julia> X, T, Y =  rand(100, 5), [rand()<0.4 for i in 1:100], rand(100)
julia> m1 = TLearner(X, T, Y)
julia> m2 = TLearner(X, T, Y; regularized=false)

julia> x_df = DataFrame(x1=rand(100), x2=rand(100), x3=rand(100), x4=rand(100))
julia> t_df, y_df = DataFrame(t=rand(0:1, 100)), DataFrame(y=rand(100))
julia> m3 = TLearner(x_df, t_df, y_df)

CausalELM.XLearner — Type

XLearner(X, T, Y; kwargs...)

Initialize an X-Learner.

Arguments

X::Any: an array or DataFrame of covariates.
T::Any: an vector or DataFrame of treatment statuses.
Y::Any: an array or DataFrame of outcomes.

Keywords

activation::Function=swish: the activation function to use.
sample_size::Integer=size(X, 1): number of bootstrapped samples for eth extreme learners.
num_machines::Integer=50: number of extreme learning machines for the ensemble.
num_feats::Integer=Int(round(0.75 * size(X, 2))): number of features to bootstrap for each learner in the ensemble.
num_neurons::Integer: number of neurons to use in the extreme learning machines.

Notes

To reduce the computational complexity you can reduce samplesize, nummachines, or num_neurons.

References

For an overview of X-Learners and other metalearners see: Künzel, Sören R., Jasjeet S. Sekhon, Peter J. Bickel, and Bin Yu. "Metalearners for estimating heterogeneous treatment effects using machine learning." Proceedings of the national academy of sciences 116, no. 10 (2019): 4156-4165.

Examples

julia> X, T, Y =  rand(100, 5), [rand()<0.4 for i in 1:100], rand(100)
julia> m1 = XLearner(X, T, Y)
julia> m2 = XLearner(X, T, Y; regularized=false)

julia> x_df = DataFrame(x1=rand(100), x2=rand(100), x3=rand(100), x4=rand(100))
julia> t_df, y_df = DataFrame(t=rand(0:1, 100)), DataFrame(y=rand(100))
julia> m3 = XLearner(x_df, t_df, y_df)

CausalELM.RLearner — Type

RLearner(X, T, Y; kwargs...)

Initialize an R-Learner.

Arguments

X::Any: an array or DataFrame of covariates of interest.
T::Any: an vector or DataFrame of treatment statuses.
Y::Any: an array or DataFrame of outcomes.

Keywords

activation::Function=swish: the activation function to use.
sample_size::Integer=size(X, 1): number of bootstrapped samples for eth extreme learners.
num_machines::Integer=50: number of extreme learning machines for the ensemble.
num_feats::Integer=Int(round(0.75 * size(X, 2))): number of features to bootstrap for each learner in the ensemble.
num_neurons::Integer: number of neurons to use in the extreme learning machines.

Notes

To reduce the computational complexity you can reduce samplesize, nummachines, or num_neurons.

References

For an explanation of R-Learner estimation see: Nie, Xinkun, and Stefan Wager. "Quasi-oracle estimation of heterogeneous treatment effects." Biometrika 108, no. 2 (2021): 299-319.

Examples

julia> X, T, Y =  rand(100, 5), [rand()<0.4 for i in 1:100], rand(100)
julia> m1 = RLearner(X, T, Y)

julia> x_df = DataFrame(x1=rand(100), x2=rand(100), x3=rand(100), x4=rand(100))
julia> t_df, y_df = DataFrame(t=rand(0:1, 100)), DataFrame(y=rand(100))
julia> m2 = RLearner(x_df, t_df, y_df)

CausalELM.DoublyRobustLearner — Type

DoublyRobustLearner(X, T, Y; kwargs...)

Initialize a doubly robust CATE estimator.

Arguments

X::Any: an array or DataFrame of covariates of interest.
T::Any: an vector or DataFrame of treatment statuses.
Y::Any: an array or DataFrame of outcomes.

Keywords

activation::Function=swish: the activation function to use.
sample_size::Integer=size(X, 1): number of bootstrapped samples for eth extreme learners.
num_machines::Integer=50: number of extreme learning machines for the ensemble.
num_feats::Integer=Int(round(0.75 * size(X, 2))): number of features to bootstrap for each learner in the ensemble.
num_neurons::Integer: number of neurons to use in the extreme learning machines.

Notes

To reduce the computational complexity you can reduce samplesize, nummachines, or num_neurons.

References

For an explanation of doubly robust cate estimation see: Kennedy, Edward H. "Towards optimal doubly robust estimation of heterogeneous causal effects." Electronic Journal of Statistics 17, no. 2 (2023): 3008-3049.

Examples

julia> X, T, Y =  rand(100, 5), [rand()<0.4 for i in 1:100], rand(100)
julia> m1 = DoublyRobustLearner(X, T, Y)

julia> x_df = DataFrame(x1=rand(100), x2=rand(100), x3=rand(100), x4=rand(100))
julia> t_df, y_df = DataFrame(t=rand(0:1, 100)), DataFrame(y=rand(100))
julia> m2 = DoublyRobustLearner(x_df, t_df, y_df)

julia> w = rand(100, 6)
julia> m3 = DoublyRobustLearner(X, T, Y, W=w)

CausalELM.CausalEstimator — Type

Abstract type for GComputation and DoubleMachineLearning

CausalELM.Metalearner — Type

Abstract type for metalearners

CausalELM.ExtremeLearner — Type

ExtremeLearner(X, Y, hidden_neurons, activation)

Construct an ExtremeLearner for fitting and prediction.

Notes

While it is possible to use an ExtremeLearner for regression, it is recommended to use RegularizedExtremeLearner, which imposes an L2 penalty, to reduce multicollinearity.

References

For more details see: Huang, Guang-Bin, Qin-Yu Zhu, and Chee-Kheong Siew. "Extreme learning machine: theory and applications." Neurocomputing 70, no. 1-3 (2006): 489-501.

Examples

julia> x, y = [1.0 1.0; 0.0 1.0; 0.0 0.0; 1.0 0.0], [0.0, 1.0, 0.0, 1.0]
julia> m1 = ExtremeLearner(x, y, 10, σ)

CausalELM.ELMEnsemble — Type

ELMEnsemble(X, Y, sample_size, num_machines, num_neurons)

Initialize a bagging ensemble of extreme learning machines.

Arguments

X::Array{Float64}: array of features for predicting labels.
Y::Array{Float64}: array of labels to predict.
sample_size::Integer: how many data points to use for each extreme learning machine.
num_machines::Integer: how many extreme learning machines to use.
num_feats::Integer: how many features to consider for eac exreme learning machine.
num_neurons::Integer: how many neurons to use for each extreme learning machine.
activation::Function: activation function to use for the extreme learning machines.

Notes

ELMEnsemble uses the same bagging approach as random forests when the labels are continuous but uses the average predicted probability, rather than voting, for classification.

Examples

julia> X, Y =  rand(100, 5), rand(100)
julia> m1 = ELMEnsemble(X, Y, 10, 50, 5, 5, CausalELM.relu)

CausalELM.Nonbinary — Type

Abstract type used to dispatch risk_ratio on nonbinary treatments

CausalELM.Binary — Type

Type used to dispatch risk_ratio on binary treatments

CausalELM.Count — Type

Type used to dispatch risk_ratio on count treatments

CausalELM.Continuous — Type

Type used to dispatch risk_ratio on continuous treatments

Activation Functions

CausalELM.binary_step — Function

binary_step(x)

Apply the binary step activation function.

Examples

julia> binary_step(1)
1

julia> binary_step([-1000, 100, 1, 0, -0.001, -3])
6-element Vector{Int64}:
 0
 1
 1
 1
 0
 0

CausalELM.σ — Function

σ(x)

Apply the sigmoid activation function.

Examples

julia> σ(1)
0.7310585786300049

julia> σ([1.0, 0.0])
2-element Vector{Float64}:
 0.7310585786300049
 0.5

CausalELM.tanh — Function

tanh(x)

Apply the hyperbolic tangent activation function.

Examples

julia> CausalELM.tanh([1.0, 0.0])
2-element Vector{Float64}:
 0.7615941559557649
 0.0

CausalELM.relu — Function

relu(x)

Apply the ReLU activation function.

Examples

julia> relu(1)
1

julia> relu([1.0, 0.0, -1.0])
3-element Vector{Float64}:
 1.0
 0.0
 0.0

CausalELM.leaky_relu — Function

leaky_relu(x)

Apply the leaky ReLU activation function to a number.

Examples

julia> leaky_relu(1)
1

julia> leaky_relu([-1.0, 0.0, 1.0])
3-element Vector{Float64}:
 -0.01
  0.0
  1.0

CausalELM.swish — Function

swish(x)

Apply the swish activation function to a number.

Examples

julia> swish(1)
0.7310585786300049

julia> swish([1.0, -1.0])
2-element Vector{Float64}:
  0.7310585786300049
 -0.2689414213699951

CausalELM.softmax — Function

softmax(x)

Apply the softmax activation function to a number.

Examples

julia> softmax(1)
1.0

julia> softmax([1.0, 2.0, 3.0])
3-element Vector{Float64}:
 0.09003057317038045
 0.24472847105479764
 0.6652409557748219

julia> softmax([1.0 2.0 3.0; 4.0 5.0 6.0])
2×3 Matrix{Float64}:
 0.0900306  0.244728  0.665241
 0.0900306  0.244728  0.665241

CausalELM.softplus — Function

softplus(x)

Apply the softplus activation function to a number.

Examples

julia> softplus(1)
1.3132616875182228

julia> softplus([1.0, -1.0])
2-element Vector{Float64}:
 1.3132616875182228
 0.3132616875182228

CausalELM.gelu — Function

gelu(x)

Apply the GeLU activation function to a number.

Examples

julia> gelu(1)
0.8411919906082768

julia> gelu([-1.0, 0.0])
2-element Vector{Float64}:
 -0.15880800939172324
  0.0

CausalELM.gaussian — Function

gaussian(x)

Apply the gaussian activation function to a real number.

Examples

julia> gaussian(1)
0.36787944117144233

julia> gaussian([1.0, -1.0])
2-element Vector{Float64}:
 0.3678794411714423
 0.3678794411714423

CausalELM.hard_tanh — Function

hard_tanh(x)

Apply the hard_tanh activation function to a number.

Examples

julia> hard_tanh(-2)
-1

julia> hard_tanh([-2.0, 0.0, 2.0])
3-element Vector{Real}:
 -1
  0.0
  1

CausalELM.elish — Function

elish(x)

Apply the ELiSH activation function to a number.

Examples

julia> elish(1)
0.7310585786300049

julia> elish([-1.0, 1.0])
2-element Vector{Float64}:
 -0.17000340156854793
  0.7310585786300049

CausalELM.fourier — Function

fourrier(x)

Apply the Fourier activation function to a real number.

Examples

julia> fourier(1)
0.8414709848078965

julia> fourier([-1.0, 1.0])
2-element Vector{Float64}:
 -0.8414709848078965
  0.8414709848078965

Average Causal Effect Estimators

CausalELM.g_formula! — Function

g_formula!(g)

Compute the G-formula for G-computation and S-learning.

Examples

julia> X, T, Y =  rand(100, 5), [rand()<0.4 for i in 1:100], rand(100)
julia> m1 = GComputation(X, T, Y)
julia> g_formula!(m1)

julia> m2 = SLearner(X, T, Y)
julia> g_formula!(m2)

CausalELM.predict_residuals — Function

predict_residuals(D, x_train, x_test, y_train, y_test, t_train, t_test)

Predict treatment and outcome residuals for double machine learning or R-learning.

Notes

This method should not be called directly.

Examples

julia> X, T, Y =  rand(100, 5), [rand()<0.4 for i in 1:100], rand(100)
julia> x_train, x_test = X[1:80, :], X[81:end, :]
julia> y_train, y_test = Y[1:80], Y[81:end]
julia> t_train, t_test = T[1:80], T[81:100]
julia> m1 = DoubleMachineLearning(X, T, Y)
julia> predict_residuals(m1, x_train, x_test, y_train, y_test, t_train, t_test)

CausalELM.moving_average — Function

moving_average(x)

Calculates a cumulative moving average.

Examples

julia> moving_average([1, 2, 3])

Metalearners

CausalELM.doubly_robust_formula! — Function

doubly_robust_formula!(DRE, X, T, Y)

Estimate the CATE for a single cross fitting iteration via doubly robust estimation.

Notes

This method should not be called directly.

Arguments

DRE::DoublyRobustLearner: the DoubleMachineLearning struct to estimate the effect for.
X: a vector of three covariate folds.
T: a vector of three treatment folds.
Y: a vector of three outcome folds.

Examples

julia> X, T, Y, W =  rand(100, 5), [rand()<0.4 for i in 1:100], rand(100), rand(6, 100)
julia> m1 = DoublyRobustLearner(X, T, Y)

julia> X, T, W, Y = make_folds(m1)
julia> Z = m1.W == m1.X ? X : [reduce(hcat, (z)) for z in zip(X, W)]
julia> g_formula!(m1, X, T, Y, Z)

CausalELM.stage1! — Function

stage1!(x)

Estimate the first stage models for an X-learner.

Notes

This method should not be called by the user.

Examples

julia> X, T, Y =  rand(100, 5), [rand()<0.4 for i in 1:100], rand(100)
julia> m1 = XLearner(X, T, Y)
julia> stage1!(m1)

CausalELM.stage2! — Function

stage2!(x)

Estimate the second stage models for an X-learner.

Notes

This method should not be called by the user.

Examples

julia> X, T, Y =  rand(100, 5), [rand()<0.4 for i in 1:100], rand(100)
julia> m1 = XLearner(X, T, Y)
julia> stage1!(m1)
julia> stage2!(m1)

Common Methods

CausalELM.estimate_causal_effect! — Function

estimate_causal_effect!(its)

Estimate the effect of an event relative to a predicted counterfactual.

Examples

julia> X₀, Y₀, X₁, Y₁ =  rand(100, 5), rand(100), rand(10, 5), rand(10)
julia> m1 = InterruptedTimeSeries(X₀, Y₀, X₁, Y₁)
julia> estimate_causal_effect!(m1)

estimate_causal_effect!(g)

Estimate a causal effect of interest using G-Computation.

Notes

If treatents are administered at multiple time periods, the effect will be estimated as the average difference between the outcome of being treated in all periods and being treated in no periods. For example, given that ividuals 1, 2, ..., i ∈ I recieved either a treatment or a placebo in p different periods, the model would estimate the average treatment effect as E[Yᵢ|T₁=1, T₂=1, ... Tₚ=1, Xₚ] - E[Yᵢ|T₁=0, T₂=0, ... Tₚ=0, Xₚ].

Examples

julia> X, T, Y =  rand(100, 5), [rand()<0.4 for i in 1:100], rand(100)
julia> m1 = GComputation(X, T, Y)
julia> estimate_causal_effect!(m1)

estimate_causal_effect!(DML)

Estimate a causal effect of interest using double machine learning.

Examples

julia> X, T, Y =  rand(100, 5), [rand()<0.4 for i in 1:100], rand(100)
julia> m1 = DoubleMachineLearning(X, T, Y)
julia> estimate_causal_effect!(m1)

julia> W = rand(100, 6)
julia> m2 = DoubleMachineLearning(X, T, Y, W=W)
julia> estimate_causal_effect!(m2)

estimate_causal_effect!(s)

Estimate the CATE using an S-learner.

References

For an overview of S-learning see: Künzel, Sören R., Jasjeet S. Sekhon, Peter J. Bickel, and Bin Yu. "Metalearners for estimating heterogeneous treatment effects using machine learning." Proceedings of the national academy of sciences 116, no. 10 (2019): 4156-4165.

Examples

julia> X, T, Y =  rand(100, 5), [rand()<0.4 for i in 1:100], rand(100)
julia> m4 = SLearner(X, T, Y)
julia> estimate_causal_effect!(m4)

estimate_causal_effect!(t)

Estimate the CATE using an T-learner.

References

For an overview of T-learning see: Künzel, Sören R., Jasjeet S. Sekhon, Peter J. Bickel, and Bin Yu. "Metalearners for estimating heterogeneous treatment effects using machine learning." Proceedings of the national academy of sciences 116, no. 10 (2019): 4156-4165.

Examples

julia> X, T, Y =  rand(100, 5), [rand()<0.4 for i in 1:100], rand(100)
julia> m5 = TLearner(X, T, Y)
julia> estimate_causal_effect!(m5)

estimate_causal_effect!(x)

Estimate the CATE using an X-learner.

References

For an overview of X-learning see: Künzel, Sören R., Jasjeet S. Sekhon, Peter J. Bickel, and Bin Yu. "Metalearners for estimating heterogeneous treatment effects using machine learning." Proceedings of the national academy of sciences 116, no. 10 (2019): 4156-4165.

Examples

julia> X, T, Y =  rand(100, 5), [rand()<0.4 for i in 1:100], rand(100)
julia> m1 = XLearner(X, T, Y)
julia> estimate_causal_effect!(m1)

estimate_causal_effect!(R)

Estimate the CATE using an R-learner.

References

For an overview of R-learning see: Nie, Xinkun, and Stefan Wager. "Quasi-oracle estimation of heterogeneous treatment effects." Biometrika 108, no. 2 (2021): 299-319.

Examples

julia> X, T, Y =  rand(100, 5), [rand()<0.4 for i in 1:100], rand(100)
julia> m1 = RLearner(X, T, Y)
julia> estimate_causal_effect!(m1)

estimate_causal_effect!(DRE)

Estimate the CATE using a doubly robust learner.

References

For details on how this method estimates the CATE see: Kennedy, Edward H. "Towards optimal doubly robust estimation of heterogeneous causal effects." Electronic Journal of Statistics 17, no. 2 (2023): 3008-3049.

Examples

julia> X, T, Y =  rand(100, 5), [rand()<0.4 for i in 1:100], rand(100)
julia> m1 = DoublyRobustLearner(X, T, Y)
julia> estimate_causal_effect!(m1)

Inference

CausalELM.summarize — Function

summarize(mod, kwargs...)

Get a summary from a CausalEstimator or Metalearner.

Arguments

mod::Union{CausalEstimator, Metalearner}: a model to summarize.

Keywords

n::Int=100: the number of iterations to generate the numll distribution for randomization inference.
inference::Bool=false: wheteher calculate p-values and standard errors.

Notes

p-values and standard errors are estimated using approximate randomization inference. If set to true, this procedure takes a VERY long time due to repeated matrix inversions.

References

For a primer on randomization inference see: https://www.mattblackwell.org/files/teaching/s05-fisher.pdf

Examples

julia> X, T, Y = rand(100, 5), [rand()<0.4 for i in 1:100], rand(100)
julia> m1 = GComputation(X, T, Y)
julia> estimate_causal_effect!(m1)
julia> summarize(m1)

julia> m2 = RLearner(X, T, Y)
julia> estimate_causal_effect(m2)
julia> summarize(m2)

julia> m3 = SLearner(X, T, Y)
julia> estimate_causal_effect!(m3)
julia> summarise(m3)  # British spelling works too!

summarize(its, kwargs...)

Get a summary from an interrupted time series estimator.

Arguments

its::InterruptedTimeSeries: interrupted time series estimator

Keywords

n::Int=100: number of iterations to generate the numll distribution for randomization inference.
mean_effect::Bool=true: whether to estimate the mean or cumulative effect for an interrupted time series estimator.
inference::Bool=false: wheteher calculate p-values and standard errors.

Notes

p-values and standard errors are estimated using approximate randomization inference. If set to true, this procedure takes a VERY long time due to repeated matrix inversions.

Examples

julia> X₀, Y₀, X₁, Y₁ =  rand(100, 5), rand(100), rand(10, 5), rand(10)
julia> m4 = InterruptedTimeSeries(X₀, Y₀, X₁, Y₁)
julia> estimate_causal_effect!(m4)
julia> summarize(m4)

CausalELM.generate_null_distribution — Function

generate_null_distribution(mod, n)

Generate a null distribution for the treatment effect of G-computation, double machine learning, or metalearning.

Arguments

mod::Any: model to summarize.
n::Int=100: number of iterations to generate the null distribution for randomization inference.

Notes

This method estimates the same model that is provided using random permutations of the treatment assignment to generate a vector of estimated effects under different treatment regimes. When mod is a metalearner the null statistic is the difference is the ATE.

Note that lowering the number of iterations increases the probability of failing to reject the null hypothesis.

Examples

julia> x, t, y = rand(100, 5), [rand()<0.4 for i in 1:100], rand(1:100, 100, 1)
julia> g_computer = GComputation(x, t, y)
julia> estimate_causal_effect!(g_computer)
julia> generate_null_distribution(g_computer, 500)

generate_null_distribution(its, n, mean_effect)

Arguments

its::InterruptedTimeSeries: interrupted time series estimator
n::Int=100: number of iterations to generate the numll distribution for randomization inference.
mean_effect::Bool=true: whether to estimate the mean or cumulative effect for an interrupted time series estimator.

Examples

julia> x₀, y₀, x₁, y₁ = rand(1:100, 100, 5), rand(100), rand(10, 5), rand(10)
julia> its = InterruptedTimeSeries(x₀, y₀, x₁, y₁)
julia> estimate_causal_effect!(its)
julia> generate_null_distribution(its, 10)

CausalELM.quantities_of_interest — Function

quantities_of_interest(mod, n)

Generate a p-value and standard error through randomization inference

Note that lowering the number of iterations increases the probability of failing to reject the null hypothesis.

For a primer on randomization inference see: https://www.mattblackwell.org/files/teaching/s05-fisher.pdf

Examples

julia> x, t, y = rand(100, 5), [rand()<0.4 for i in 1:100], rand(1:100, 100, 1)
julia> g_computer = GComputation(x, t, y)
julia> estimate_causal_effect!(g_computer)
julia> quantities_of_interest(g_computer, 1000)

quantities_of_interest(mod, n)

Generate a p-value and standard error through randomization inference

This method generates a null distribution of treatment effects by reestimating treatment effects from permutations of the treatment vector and estimates a p-value and standard from the generated distribution. Randomization for event studies is done by creating time splits at even intervals and reestimating the causal effect.

Note that lowering the number of iterations increases the probability of failing to reject the null hypothesis.

For a primer on randomization inference see: https://www.mattblackwell.org/files/teaching/s05-fisher.pdf

Examples

julia> x₀, y₀, x₁, y₁ = rand(1:100, 100, 5), rand(100), rand(10, 5), rand(10)
julia> its = InterruptedTimeSeries(x₀, y₀, x₁, y₁)
julia> estimate_causal_effect!(its)
julia> quantities_of_interest(its, 10)

Model Validation

CausalELM.validate — Function

validate(its; kwargs...)

Test the validity of an estimated interrupted time series analysis.

Arguments

its::InterruptedTimeSeries: an interrupted time seiries estimator.

Keywords

n::Int: number of times to simulate a confounder.
low::Float64=0.15: minimum proportion of data points to include before or after the tested break in the Wald supremum test.
high::Float64=0.85: maximum proportion of data points to include before or after the tested break in the Wald supremum test.

Notes

This method coducts a Chow Test, a Wald supremeum test, and tests the model's sensitivity to confounders. The Chow Test tests for structural breaks in the covariates between the time before and after the event. p-values represent the proportion of times the magnitude of the break in a covariate would have been greater due to chance. Lower p-values suggest a higher probability the event effected the covariates and they cannot provide unbiased counterfactual predictions. The Wald supremum test finds the structural break with the highest Wald statistic. If this is not the same as the hypothesized break, it could indicate an anticipation effect, a confounding event, or that the intervention or policy took place in multiple phases. p-values represent the proportion of times we would see a larger Wald statistic if the data points were randomly allocated to pre and post-event periods for the predicted structural break. Ideally, the hypothesized break will be the same as the predicted break and it will also have a low p-value. The omitted predictors test adds normal random variables with uniform noise as predictors. If the included covariates are good predictors of the counterfactual outcome, adding irrelevant predictors should not have a large effect on the predicted counterfactual outcomes or the estimated effect.

This method does not implement the second test in Baicker and Svoronos because the estimator in this package models the relationship between covariates and the outcome and uses an extreme learning machine instead of linear regression, so variance in the outcome across different bins is not much of an issue.

References

For more details on the assumptions and validity of interrupted time series designs, see: Baicker, Katherine, and Theodore Svoronos. Testing the validity of the single interrupted time series design. No. w26080. National Bureau of Economic Research, 2019.

For a primer on randomization inference see: https://www.mattblackwell.org/files/teaching/s05-fisher.pdf

Examples

julia> X₀, Y₀, X₁, Y₁ = rand(100, 5), rand(100), rand(10, 5), rand(10)
julia> m1 = InterruptedTimeSeries(X₀, Y₀, X₁, Y₁)
julia> estimate_causal_effect!(m1)
julia> validate(m1)

validate(m; kwargs)

Arguments

m::Union{CausalEstimator, Metalearner}: model to validate/test the assumptions of.

Keywords

devs=::Any: iterable of deviations from which to generate noise to simulate violations of the counterfactual consistency assumption.
num_iterations=10::Int: number of times to simulate a violation of the counterfactual consistency assumption.
min::Float64=1.0e-6: minimum probability of treatment for the positivity assumption.
high::Float64=1-min: maximum probability of treatment for the positivity assumption.

Notes

This method tests the counterfactual consistency, exchangeability, and positivity assumptions required for causal inference. It should be noted that consistency and exchangeability are not directly testable, so instead, these tests do not provide definitive evidence of a violation of these assumptions. To probe the counterfactual consistency assumption, we simulate counterfactual outcomes that are different from the observed outcomes, estimate models with the simulated counterfactual outcomes, and take the averages. If the outcome is continuous, the noise for the simulated counterfactuals is drawn from N(0, dev) for each element in devs, otherwise the default is 0.25, 0.5, 0.75, and 1.0 standard deviations from the mean outcome. For discrete variables, each outcome is replaced with a different value in the range of outcomes with probability ϵ for each ϵ in devs, otherwise the default is 0.025, 0.05, 0.075, 0.1. If the average estimate for a given level of violation differs greatly from the effect estimated on the actual data, then the model is very sensitive to violations of the counterfactual consistency assumption for that level of violation. Next, this methods tests the model's sensitivity to a violation of the exchangeability assumption by calculating the E-value, which is the minimum strength of association, on the risk ratio scale, that an unobserved confounder would need to have with the treatment and outcome variable to fully explain away the estimated effect. Thus, higher E-values imply the model is more robust to a violation of the exchangeability assumption. Finally, this method tests the positivity assumption by estimating propensity scores. Rows in the matrix are levels of covariates that have a zero probability of treatment. If the matrix is empty, none of the observations have an estimated zero probability of treatment, which implies the positivity assumption is satisfied.

References

For a thorough review of casual inference assumptions see: Hernan, Miguel A., and James M. Robins. Causal inference what if. Boca Raton: Taylor and Francis, 2024.

For more information on the E-value test see: VanderWeele, Tyler J., and Peng Ding. "Sensitivity analysis in observational research: introducing the E-value." Annals of internal medicine 167, no. 4 (2017): 268-274.

Examples

julia> x, t, y = rand(100, 5), Float64.([rand()<0.4 for i in 1:100]), vec(rand(1:100, 100, 1)) 
julia> g_computer = GComputation(x, t, y, temporal=false)
julia> estimate_causal_effect!(g_computer)
julia> validate(g_computer)

CausalELM.covariate_independence — Function

covariate_independence(its; kwargs..)

Test for independence between covariates and the event or intervention.

Arguments

its::InterruptedTImeSeries: an interrupted time series estimator.

Keywords

n::Int: number of permutations for assigning observations to the pre and post-treatment periods.

This is a Chow Test for covariates with p-values estimated via randomization inference, which does not assume a distribution for the outcome variable. The p-values are the proportion of times randomly assigning observations to the pre or post-intervention period would have a larger estimated effect on the the slope of the covariates. The lower the p-values, the more likely it is that the event/intervention effected the covariates and they cannot provide an unbiased prediction of the counterfactual outcomes.

For more information on using a Chow Test to test for structural breaks see: Baicker, Katherine, and Theodore Svoronos. Testing the validity of the single interrupted time series design. No. w26080. National Bureau of Economic Research, 2019.

For a primer on randomization inference see: https://www.mattblackwell.org/files/teaching/s05-fisher.pdf

Examples

julia> x₀, y₀, x₁, y₁ = (Float64.(rand(1:5, 100, 5)), randn(100), rand(1:5, (10, 5)), 
       randn(10))
julia> its = InterruptedTimeSeries(x₀, y₀, x₁, y₁)
julia> estimate_causal_effect!(its)
julia> covariate_independence(its)

CausalELM.omitted_predictor — Function

omitted_predictor(its; kwargs...)

See how an omitted predictor/variable could change the results of an interrupted time series analysis.

Arguments

its::InterruptedTImeSeries: interrupted time seiries estimator.

Keywords

n::Int: number of times to simulate a confounder.

Notes

This method reestimates interrupted time series models with uniform random variables. If the included covariates are good predictors of the counterfactual outcome, adding a random variable as a covariate should not have a large effect on the predicted counterfactual outcomes and therefore the estimated average effect.

For a primer on randomization inference see: https://www.mattblackwell.org/files/teaching/s05-fisher.pdf

Examples

julia> x₀, y₀, x₁, y₁ = (Float64.(rand(1:5, 100, 5)), randn(100), rand(1:5, (10, 5)), randn(10))
julia> its = InterruptedTimeSeries(x₀, y₀, x₁, y₁)
julia> estimate_causal_effect!(its)
julia> omitted_predictor(its)

CausalELM.sup_wald — Function

sup_wald(its; kwargs)

Check if the predicted structural break is the hypothesized structural break.

Arguments

its::InterruptedTimeSeries: interrupted time seiries estimator.

Keywords

n::Int: number of times to simulate a confounder.
low::Float64=0.15: minimum proportion of data points to include before or after the tested break in the Wald supremum test.
high::Float64=0.85: maximum proportion of data points to include before or after the tested break in the Wald supremum test.

Notes

This method conducts Wald tests and identifies the structural break with the highest Wald statistic. If this break is not the same as the hypothesized break, it could indicate an anticipation effect, confounding by some other event or intervention, or that the intervention or policy took place in multiple phases. p-values are estimated using approximate randomization inference and represent the proportion of times we would see a larger Wald statistic if the data points were randomly allocated to pre and post-event periods for the predicted structural break.

References

For a primer on randomization inference see: https://www.mattblackwell.org/files/teaching/s05-fisher.pdf

Examples

julia> x₀, y₀, x₁, y₁ = (Float64.(rand(1:5, 100, 5)), randn(100), rand(1:5, (10, 5)), 
       randn(10))
julia> its = InterruptedTimeSeries(x₀, y₀, x₁, y₁)
julia> estimate_causal_effect!(its)
julia> sup_wald(its)

CausalELM.p_val — Function

p_val(x, y, β; kwargs...)

Estimate the p-value for the hypothesis that an event had a statistically significant effect on the slope of a covariate using randomization inference.

Arguments

x::Array{<:Real}: covariates.
y::Array{<:Real}: outcome.
β::Array{<:Real}=0.15: fitted weights.

Keywords

two_sided::Bool=false: whether to conduct a one-sided hypothesis test.

Examples

julia> x, y, β = reduce(hcat, (float(rand(0:1, 10)), ones(10))), rand(10), 0.5
julia> p_val(x, y, β)
julia> p_val(x, y, β; n=100, two_sided=true)

CausalELM.counterfactual_consistency — Function

counterfactual_consistency(m; kwargs...)

Arguments

m::Union{CausalEstimator, Metalearner}: model to validate/test the assumptions of.

Keywords

num_devs=(0.25, 0.5, 0.75, 1.0)::Tuple: number of standard deviations from which to generate noise from a normal distribution to simulate violations of the counterfactual consistency assumption.
num_iterations=10::Int: number of times to simulate a violation of the counterfactual consistency assumption.

Notes

Examine the counterfactual consistency assumption. First, this function simulates counterfactual outcomes that are offset from the outcomes in the dataset by random scalars drawn from a N(0, numstddev). Then, the procedure is repeated numiterations times and averaged. If the model is a metalearner, then the estimated individual treatment effects are averaged and the mean CATE is averaged over all the iterations, otherwise the estimated treatment effect is averaged over the iterations. The previous steps are repeated for each element in numdevs.

Examples

julia> x, t = rand(100, 5), Float64.([rand()<0.4 for i in 1:100]
julia> y = vec(rand(1:100, 100, 1)))
julia> g_computer = GComputation(x, t, y, temporal=false)
julia> estimate_causal_effect!(g_computer)
julia> counterfactual_consistency(g_computer)

CausalELM.simulate_counterfactual_violations — Function

simulate_counterfactual_violations(y, dev)

Arguments

y::Vector{<:Real}: vector of real-valued outcomes.
dev::Float64: deviation of the observed outcomes from the true counterfactual outcomes.

Examples

julia> x, t, y = rand(100, 5), Float64.([rand()<0.4 for i in 1:100]), vec(rand(1:100, 100, 1)) 
julia> g_computer = GComputation(x, t, y, temporal=false)
julia> estimate_causal_effect!(g_computer)
julia> simulate_counterfactual_violations(g_computer)
-0.7748591231872396

CausalELM.exchangeability — Function

exchangeability(model)

Test the sensitivity of a G-computation or doubly robust estimator or metalearner to a violation of the exchangeability assumption.

References

Examples

julia> x, t = rand(100, 5), Float64.([rand()<0.4 for i in 1:100]
julia> y = vec(rand(1:100, 100, 1)))
julia> g_computer = GComputation(x, t, y, temporal=false)
julia> estimate_causal_effect!(g_computer)
julia> e_value(g_computer)

CausalELM.e_value — Function

e_value(model)

Test the sensitivity of an estimator to a violation of the exchangeability assumption.

References

Examples

julia> x, t = rand(100, 5), Float64.([rand()<0.4 for i in 1:100]
julia> y = vec(rand(1:100, 100, 1)))
julia> g_computer = GComputation(x, t, y, temporal=false)
julia> estimate_causal_effect!(g_computer)
julia> e_value(g_computer)

CausalELM.binarize — Function

binarize(x, cutoff)

Convert a vector of counts or a continuous vector to a binary vector.

Arguments

x::Any: interable of numbers to binarize.
x::Any: threshold after which numbers are converted to 1 and befrore which are converted to 0.

Examples

julia> CausalELM.binarize([1, 2, 3], 2)
3-element Vector{Int64}:
 0
 0
 1

CausalELM.risk_ratio — Function

risk_ratio(model)

Calculate the risk ratio for an estimated model.

Notes

If the treatment variable is not binary and the outcome variable is not continuous then the treatment variable will be binarized.

References

For more information on how other quantities of interest are converted to risk ratios see: VanderWeele, Tyler J., and Peng Ding. "Sensitivity analysis in observational research: introducing the E-value." Annals of internal medicine 167, no. 4 (2017): 268-274.

Examples

julia> x, t = rand(100, 5), Float64.([rand()<0.4 for i in 1:100]
julia> y = vec(rand(1:100, 100, 1)))
julia> g_computer = GComputation(x, t, y, temporal=false)
julia> estimate_causal_effect!(g_computer)
julia> risk_ratio(g_computer)

CausalELM.positivity — Function

positivity(model, [,min], [,max])

Find likely violations of the positivity assumption.

Notes

This method uses an extreme learning machine or regularized extreme learning machine to estimate probabilities of treatment. The returned matrix, which may be empty, are the covariates that have a (near) zero probability of treatment or near zero probability of being assigned to the control group, whith their entry in the last column being their estimated treatment probability. In other words, they likely violate the positivity assumption.

Arguments

model::Union{CausalEstimator, Metalearner}: a model to validate/test the assumptions of.
min::Float64=1.0e-6: minimum probability of treatment for the positivity assumption.
high::Float64=1-min: the maximum probability of treatment for the positivity assumption.

Examples

julia> x, t = rand(100, 5), Float64.([rand()<0.4 for i in 1:100]
julia> y = vec(rand(1:100, 100, 1)))
julia> g_computer = GComputation(x, t, y, temporal=false)
julia> estimate_causal_effect!(g_computer)
julia> positivity(g_computer)

Validation Metrics

CausalELM.mse — Function

mse(y, ŷ)

Calculate the mean squared error

Extreme Learning Machines

CausalELM.fit! — Function

fit!(model)

Fit an ExtremeLearner to the data.

References

For more details see: Huang, Guang-Bin, Qin-Yu Zhu, and Chee-Kheong Siew. "Extreme learning machine: theory and applications." Neurocomputing 70, no. 1-3 (2006): 489-501.

Examples

julia> x, y = [1.0 1.0; 0.0 1.0; 0.0 0.0; 1.0 0.0], [0.0, 1.0, 0.0, 1.0]
julia> m1 = ExtremeLearner(x, y, 10, σ)

fit!(model)

Fit an ensemble of ExtremeLearners to the data.

Arguments

model::ELMEnsemble: ensemble of ExtremeLearners to fit.

Notes

This uses the same bagging approach as random forests when the labels are continuous but uses the average predicted probability, rather than voting, for classification.

Examples

julia> X, Y =  rand(100, 5), rand(100)
julia> m1 = ELMEnsemble(X, Y, 10, 50, 5, CausalELM.relu)
julia> fit!(m1)

CausalELM.predict — Function

predict(model, X)

Use an ExtremeLearningMachine or ELMEnsemble to make predictions.

Notes

If using an ensemble to make predictions, this method returns a maxtirs where each row is a prediction and each column is a model.

References

For more details see: Huang G-B, Zhu Q-Y, Siew C. Extreme learning machine: theory and applications. Neurocomputing. 2006;70:489–501. https://doi.org/10.1016/j.neucom.2005.12.126

Examples

julia> x, y = [1.0 1.0; 0.0 1.0; 0.0 0.0; 1.0 0.0], [0.0, 1.0, 0.0, 1.0]
julia> m1 = ExtremeLearner(x, y, 10, σ)
julia> fit!(m1, sigmoid)
julia> predict(m1, [1.0 1.0; 0.0 1.0; 0.0 0.0; 1.0 0.0])

julia> m2 = ELMEnsemble(X, Y, 10, 50, 5, CausalELM.relu)
julia> fit!(m2)
julia> predict(m2)

CausalELM.predict_counterfactual! — Function

predict_counterfactual!(model, X)

Use an ExtremeLearningMachine to predict the counterfactual.

Notes

This should be run with the observed covariates. To use synthtic data for what-if scenarios use predict.

Utility Functions

CausalELM.var_type — Function

var_type(x)

Determine the type of variable held by a vector.

Examples

julia> CausalELM.var_type([1, 2, 3, 2, 3, 1, 1, 3, 2])
CausalELM.Count()

CausalELM.mean — Function

mean(x)

Calculate the mean of a vector.

Examples

julia> CausalELM.mean([1, 2, 3, 4])
2.5

CausalELM.var — Function

var(x)

Calculate the (sample) mean of a vector.

Examples

julia> CausalELM.var([1, 2, 3, 4])
1.6666666666666667

CausalELM.one_hot_encode — Function

one_hot_encode(x)

One hot encode a categorical vector for multiclass classification.

Examples

julia> CausalELM.one_hot_encode([1, 2, 3, 4, 5])
5×5 Matrix{Float64}:
 1.0  0.0  0.0  0.0  0.0
 0.0  1.0  0.0  0.0  0.0
 0.0  0.0  1.0  0.0  0.0
 0.0  0.0  0.0  1.0  0.0
 0.0  0.0  0.0  0.0  1.0

CausalELM.clip_if_binary — Function

clip_if_binary(x, var)

Constrain binary values between 1e-7 and 1 - 1e-7, otherwise return the original values.

Arguments

x::Array: array to clip if it is binary.
var: type of x based on calling var_type.