Earth.EarthConfig
— MethodEarthConfig(; constraints=Set{Vector{Bool}}(), num_knots=20, maxit=10, maxorder=2, maxdegree=2,
knot_penalty=ifelse(maxorder>1, 3, 2), min_r2=0.001, min_coef=0.01,
prune::Bool=true, refit::Symbol=:lasso)
Keyword arguments
num_knots
: The number of hinge function knots for each variable.maxit
: The number of basis construction iterations.constraints
: A set of bit vectors that constrain the combinations of variables that can be used to produce a term.prune
: If false, perform the basis construction step but do not perform the pruning step.maxorder
: The maximum number of distinct variables that can be present in a single term.maxdegree
: The maximum number of hinges for a single variable that can occur in one termknot_penalty
: A parameter that controls how easily a new term can enter the model.min_r2
: Terminate the forward pass if the increase in R2 falls below this value.min_coef
: Terms with standardized coefficient falling below this value are pruned.prune
: If true, prune the model using the Lasso.refit
: After pruning, refit the model using either lasso (:lasso) or OLS (:ols).
Earth.gcv
— Methodgcv(E::EarthModel)
Returns the generalized cross validation (GCV) statistics at each step of the forward pass.
Earth.gr2
— Method gr2(E::EarthModel)
Returns the generalized R-square (adjusted for model complexity).
StatsAPI.fit
— Methodfit(EarthModel, X, y; config::EarthConfig=EarthConfig(), prune=true, verbosity=0)
Fit a regression model using an approach similar to Friedman's 1991 MARS procedure (also known as Earth for trademark reasons). The covariates are in X
and the vector y
contains the response values.
Earth/MARS involves two steps: a greedy basis construction followed by pruning step that aims to eliminate irrelevant terms. The basis functions are products of hinge functions. This implementation uses the Lasso instead of back-selection to prune the model.
The covariates X
can be a numeric Matrix, a data frame, a vector or tuple of vectors, or a named tuple whose values are vectors. In the latter two cases, each covariate vector must be of numeric or string type, or an instance of CategoricalArray. The latter-two types are expanded into binary indicator vectors.
The config
argument can be used to specify many aspects of how the model is fit. See EarthConfig
for more specifics.
References:
Friedman (1991) "Multivariate Adaptive Regression Splines". Ann. Statist. 19(1): 1-67 (March, 1991). DOI: 10.1214/aos/1176347963 https://projecteuclid.org/journals/annals-of-statistics/volume-19/issue-1/Multivariate-Adaptive-Regression-Splines/10.1214/aos/1176347963.full
Keyword arguments
weights
: optional case weightsconfig
: permits configuration of many tuning parametersverbosity
: Print some information as the fitting algorithm runs