EarthConfig(; constraints=Set{Vector{Bool}}(), num_knots=20, maxit=10, maxorder=2, maxdegree=2,
            knot_penalty=ifelse(maxorder>1, 3, 2), min_r2=0.001, min_coef=0.01,
            prune::Bool=true, refit::Symbol=:lasso)

Keyword arguments

  • num_knots: The number of hinge function knots for each variable.
  • maxit: The number of basis construction iterations.
  • constraints: A set of bit vectors that constrain the combinations of variables that can be used to produce a term.
  • prune: If false, perform the basis construction step but do not perform the pruning step.
  • maxorder: The maximum number of distinct variables that can be present in a single term.
  • maxdegree: The maximum number of hinges for a single variable that can occur in one term
  • knot_penalty: A parameter that controls how easily a new term can enter the model.
  • min_r2: Terminate the forward pass if the increase in R2 falls below this value.
  • min_coef: Terms with standardized coefficient falling below this value are pruned.
  • prune: If true, prune the model using the Lasso.
  • refit: After pruning, refit the model using either lasso (:lasso) or OLS (:ols).

Returns the generalized cross validation (GCV) statistics at each step of the forward pass.


Returns the generalized R-square (adjusted for model complexity).

fit(EarthModel, X, y; config::EarthConfig=EarthConfig(), prune=true, verbosity=0)

Fit a regression model using an approach similar to Friedman's 1991 MARS procedure (also known as Earth for trademark reasons). The covariates are in X and the vector y contains the response values.

Earth/MARS involves two steps: a greedy basis construction followed by pruning step that aims to eliminate irrelevant terms. The basis functions are products of hinge functions. This implementation uses the Lasso instead of back-selection to prune the model.

The covariates X can be a numeric Matrix, a data frame, a vector or tuple of vectors, or a named tuple whose values are vectors. In the latter two cases, each covariate vector must be of numeric or string type, or an instance of CategoricalArray. The latter-two types are expanded into binary indicator vectors.

The config argument can be used to specify many aspects of how the model is fit. See EarthConfig for more specifics.


Friedman (1991) "Multivariate Adaptive Regression Splines". Ann. Statist. 19(1): 1-67 (March, 1991). DOI: 10.1214/aos/1176347963

Keyword arguments

  • weights: optional case weights
  • config: permits configuration of many tuning parameters
  • verbosity: Print some information as the fitting algorithm runs