Earth.EarthConfigMethod
EarthConfig(; constraints=Set{Vector{Bool}}(), num_knots=20, maxit=10, maxorder=2, maxdegree=2,
            knot_penalty=ifelse(maxorder>1, 3, 2), min_r2=0.001, min_coef=0.01,
            prune::Bool=true, refit::Symbol=:lasso)

Keyword arguments

  • num_knots: The number of hinge function knots for each variable.
  • maxit: The number of basis construction iterations.
  • constraints: A set of bit vectors that constrain the combinations of variables that can be used to produce a term.
  • prune: If false, perform the basis construction step but do not perform the pruning step.
  • maxorder: The maximum number of distinct variables that can be present in a single term.
  • maxdegree: The maximum number of hinges for a single variable that can occur in one term
  • knot_penalty: A parameter that controls how easily a new term can enter the model.
  • min_r2: Terminate the forward pass if the increase in R2 falls below this value.
  • min_coef: Terms with standardized coefficient falling below this value are pruned.
  • prune: If true, prune the model using the Lasso.
  • refit: After pruning, refit the model using either lasso (:lasso) or OLS (:ols).
Earth.gcvMethod
gcv(E::EarthModel)

Returns the generalized cross validation (GCV) statistics at each step of the forward pass.

Earth.gr2Method
 gr2(E::EarthModel)

Returns the generalized R-square (adjusted for model complexity).

StatsAPI.fitMethod
fit(EarthModel, X, y; config::EarthConfig=EarthConfig(), prune=true, verbosity=0)

Fit a regression model using an approach similar to Friedman's 1991 MARS procedure (also known as Earth for trademark reasons). The covariates are in X and the vector y contains the response values.

Earth/MARS involves two steps: a greedy basis construction followed by pruning step that aims to eliminate irrelevant terms. The basis functions are products of hinge functions. This implementation uses the Lasso instead of back-selection to prune the model.

The covariates X can be a numeric Matrix, a data frame, a vector or tuple of vectors, or a named tuple whose values are vectors. In the latter two cases, each covariate vector must be of numeric or string type, or an instance of CategoricalArray. The latter-two types are expanded into binary indicator vectors.

The config argument can be used to specify many aspects of how the model is fit. See EarthConfig for more specifics.

References:

Friedman (1991) "Multivariate Adaptive Regression Splines". Ann. Statist. 19(1): 1-67 (March, 1991). DOI: 10.1214/aos/1176347963 https://projecteuclid.org/journals/annals-of-statistics/volume-19/issue-1/Multivariate-Adaptive-Regression-Splines/10.1214/aos/1176347963.full

Keyword arguments

  • weights: optional case weights
  • config: permits configuration of many tuning parameters
  • verbosity: Print some information as the fitting algorithm runs