Docstrings · AMLPipelineBase.jl

AMLPipelineBase.BaseFilters.Imputer — Type

Imputer(
   Dict(
      # Imputation strategy.
      # Statistic that takes a vector such as mean or median.
      :strategy => mean
   )
)

Imputes NaN values from Float64 features.

Implements fit! and transform.

AMLPipelineBase.BaseFilters.OneHotEncoder — Type

OneHotEncoder(Dict(
   # Nominal columns
   :nominal_columns => Int[],

   # Nominal column values map. Key is column index, value is list of
   # possible values for that column.
   :nominal_column_values_map => Dict{Int,Any}()
))

Transforms myinstances with nominal features into one-hot form and coerces the instance matrix to be of element type Float64.

Implements fit! and transform.

AMLPipelineBase.BaseFilters.Wrapper — Type

Wrapper(
   default_args = Dict(
      :name => "ohe-wrapper",
      # Transformer to call.
      :transformer => OneHotEncoder(),
      # Transformer args.
      :transformer_args => Dict()
   )
)

Wraps around a transformer.

Implements fit! and transform.

AMLPipelineBase.BaseFilters.createtransformer — Function

createtransformer(prototype::Transformer, args=Dict())

Create transformer

prototype: prototype transformer to base new transformer on
options: additional options to override prototype's options

Returns: new transformer.

AMLPipelineBase.Utils.aggregatorclskipmissing — Method

aggregatorclskipmissing(fn::Function)

Function to create aggregator closure with skipmissing features

AMLPipelineBase.Utils.createmachine — Function

createmachine(prototype::Machine, options=nothing)

Create machine

prototype: prototype machine to base new machine on
args: additional options to override prototype's options

Returns: new machine

AMLPipelineBase.Utils.find_catnum_columns — Function

find_catnum_columns(instances::DataFrame,maxuniqcat::Int=0)

Finds all categorial and numerical columns. Categorical columns are those that do not have Real type nor do all their elements correspond to Real. Also, columns with size of unique instances are less than maxuniqcat are considered categorical.

AMLPipelineBase.Utils.holdout — Method

holdout(n, right_prop)

Holdout method that partitions a collection into two partitions.

n: Size of collection to partition
right_prop: Percentage of collection placed in right partition

Returns: two partitions of indices, left and right

AMLPipelineBase.Utils.infer_eltype — Method

infer_eltype(vector::Vector)

Returns element type of vector unless it is Any. If Any, returns the most specific type that can be inferred from the vector elements.

vector: vector to infer element type on

Returns: inferred element type

AMLPipelineBase.Utils.kfold — Method

kfold(num_instances, num_partitions)

Returns k-fold partitions.

num_instances: total number of instances
num_partitions: number of partitions required

Returns: training set partition.

AMLPipelineBase.Utils.nested_dict_merge — Method

nested_dict_merge(first::Dict, second::Dict)

Second nested dictionary is merged into first.

If a second dictionary's value as well as the first are both dictionaries, then a merge is conducted between the two inner dictionaries. Otherwise the second's value overrides the first.

first: first nested dictionary
second: second nested dictionary

Returns: merged nested dictionary

AMLPipelineBase.Utils.nested_dict_set! — Method

nested_dict_set!(dict::Dict, keys::Array{T, 1}, value) where {T}

Set value in a nested dictionary.

dict: nested dictionary to assign value
keys: keys to access nested dictionaries in sequence
value: value to assign

AMLPipelineBase.Utils.nested_dict_to_tuples — Method

nested_dict_to_tuples(dict::Dict)

Converts nested dictionary to list of tuples

dict: dictionary that can have other dictionaries as values

Returns: list where elements are ([outer-key, inner-key, ...], value)

AMLPipelineBase.Utils.score — Method

score(metric::Symbol, actual::Vector, predicted::Vector)

Score learner predictions against ground truth values.

Available metrics:

:accuracy
metric: metric to assess with
actual: ground truth values
predicted: predicted values

Returns: score of learner

AMLPipelineBase.EnsembleMethods.BestLearner — Type

BestLearner(
   Dict(
      # Output to train against
      # (:class).
      :output => :class,
      # Function to return partitions of instance indices.
      :partition_generator => (instances, labels) -> kfold(size(instances, 1), 5),
      # Function that selects the best learner by index.
      # Arg learner_partition_scores is a (learner, partition) score matrix.
      :selection_function => (learner_partition_scores) -> findmax(mean(learner_partition_scores, dims=2))[2],      
      # Score type returned by score() using respective output.
      :score_type => Real,
      # Candidate learners.
      :learners => [PrunedTree(), Adaboost(), RandomForest()],
      # Options grid for learners, to search through by BestLearner.
      # Format is [learner_1_options, learner_2_options, ...]
      # where learner_options is same as a learner's options but
      # with a list of values instead of scalar.
      :learner_options_grid => nothing
   )
)

Selects best learner from the set by performing a grid search on learners if grid option is indicated.

AMLPipelineBase.EnsembleMethods.StackEnsemble — Type

StackEnsemble(
   Dict(    
      # Output to train against
      # (:class).
      :output => :class,
      # Set of learners that produce feature space for stacker.
      :learners => [PrunedTree(), Adaboost(), RandomForest()],
      # Machine learner that trains on set of learners' outputs.
      :stacker => RandomForest(),
      # Proportion of training set left to train stacker itself.
      :stacker_training_proportion => 0.3,
      # Provide original features on top of learner outputs to stacker.
      :keep_original_features => false
   )
)

An ensemble where a 'stack' of learners is used for training and prediction.

AMLPipelineBase.EnsembleMethods.VoteEnsemble — Type

VoteEnsemble(
   Dict( 
      # Output to train against
      # (:class).
      :output => :class,
      # Learners in voting committee.
      :learners => [PrunedTree(), Adaboost(), RandomForest()]
   )
)

Set of machine learners employing majority vote to decide prediction.

Implements: fit!, transform!

AMLPipelineBase.AbsTypes.fit! — Method

fit!(bls::BestLearner, instances::DataFrame, labels::Vector)

Training phase:

obtain learners as is if grid option is not present
generate learners if grid option is present
foreach prototype learner, generate learners with specific options found in grid
generate partitions
train each learner on each partition and obtain validation output

AMLPipelineBase.AbsTypes.fit! — Method

fit!(se::StackEnsemble, instances::DataFrame, labels::Vector)

Training phase of the stack of learners.

perform holdout to obtain indices for
partition learner and stacker training sets
partition training set for learners and stacker
train all learners
train stacker on learners' outputs
build final model from the trained learners

AMLPipelineBase.AbsTypes.fit! — Method

fit!(ve::VoteEnsemble, instances::DataFrame, labels::Vector)

Training phase of the ensemble.

AMLPipelineBase.AbsTypes.transform! — Method

transform!(bls::BestLearner, instances::DataFrame)

Choose the best learner based on cross-validation results and use it for prediction.

AMLPipelineBase.AbsTypes.transform! — Method

transform!(se::StackEnsemble, instances::DataFrame)

Build stacker instances and predict

AMLPipelineBase.AbsTypes.transform! — Method

transform!(ve::VoteEnsemble, instances::DataFrame)

Prediction phase of the ensemble.

AMLPipelineBase.AbsTypes.fit! — Method

fit!(mc::Machine, input::DataFrame, output::Vector)

Generic trait to be overloaded by different subtypes of Machine. Multiple dispatch for fit!.

AMLPipelineBase.AbsTypes.fit_transform! — Function

fit_transform!(mc::Machine, input::DataFrame, output::Vector)

Dynamic dispatch that calls in sequence fit! and transform! functions.

AMLPipelineBase.AbsTypes.transform! — Method

transform!(mc::Machine, input::DataFrame)

Generic trait to be overloaded by different subtypes of Machine. Multiple dispatch for transform!.

AMLPipelineBase.Pipelines.ComboPipeline — Type

ComboPipeline(machs::Vector{T}) where {T<:Machine}

Feature union pipeline which iteratively calls fit_transform of each element and concatenate their output into one dataframe.

Implements fit! and transform!.

AMLPipelineBase.Pipelines.Pipeline — Type

Pipeline(machs::Vector{<:Machine},args::Dict=Dict())

Linear pipeline which iteratively calls and passes the result of fit_transform to the succeeding elements in the pipeline.

Implements fit! and transform!.

AMLPipelineBase.Pipelines.Pipeline — Method

Pipeline(machs::Vararg{Machine})

Helper function for Pipeline structure.

AMLPipelineBase.Pipelines.Pipeline — Method

Pipeline(machs::Vector{<:Machine},args::Dict=Dict())

Helper function for Pipeline structure.

AMLPipelineBase.NARemovers.NARemover — Type

NARemover(
  Dict(
    :name => "nadetect",
    :acceptance => 0.10 # tolerable NAs percentage
  )
)

Removes columns with NAs greater than acceptance rate. This assumes that it processes columns of features. The output column should not be part of input to avoid it being excluded if it fails the acceptance critera.

Implements fit! and transform!.

AMLPipelineBase.NARemovers.NARemover — Method

NARemover(acceptance::Float64)

Helper function for NARemover.

AMLPipelineBase.AbsTypes.fit! — Function

fit!(nad::NARemover,features::DataFrame,labels::Vector=[])

Checks and exit of df is empty

Arguments

nad::NARemover: custom type
features::DataFrame: input
labels::Vector=[]:

AMLPipelineBase.AbsTypes.transform! — Method

transform!(nad::NARemover,nfeatures::DataFrame)

Removes columns with NAs greater than acceptance rate.

Arguments

nad::NARemover: custom type
nfeatures::DataFrame: input

AMLPipelineBase.FeatureSelectors.CatFeatureSelector — Type

CatFeatureSelector(Dict(:name => "catf"))

Automatically extract categorical columns based on inferred element types.

Implements fit! and transform!.

AMLPipelineBase.FeatureSelectors.CatNumDiscriminator — Type

CatNumDiscriminator(
   Dict(
      :name => "catnumdisc",
      :maxcategories => 24
   )
)

Transform numeric columns to string (as categories) if the count of their unique elements <= maxcategories.

Implements fit! and transform!.

AMLPipelineBase.FeatureSelectors.CatNumDiscriminator — Method

CatNumDiscriminator(maxcat::Int)

Helper function for CatNumDiscriminator.

AMLPipelineBase.FeatureSelectors.FeatureSelector — Type

FeatureSelector(
   Dict(
     :name => "featureselector",
     :columns => [col1, col2, ...]
   )
)

Returns a dataframe of the selected columns.

Implements fit! and transform!.

AMLPipelineBase.FeatureSelectors.FeatureSelector — Method

FeatureSelector(cols::Vararg{Int})

Helper function for FeatureSelector.

AMLPipelineBase.FeatureSelectors.FeatureSelector — Method

FeatureSelector(cols::Vector{Int})

Helper function for FeatureSelector.

AMLPipelineBase.FeatureSelectors.NumFeatureSelector — Type

NumFeatureSelector(Dict(:name=>"numfeatsel"))

Automatically extracts numeric features based on their inferred element types.

Implements fit! and transform!.

AMLPipelineBase.CrossValidators.crossvalidate — Method

crossvalidate(pl::Machine,X::DataFrame,Y::Vector,pfunc::Function,kfolds=10)

Run K-fold crossvalidation where:

pfunc is a performance metric
X and Y are input and target

AMLPipelineBase.BaselineModels.Baseline — Type

Baseline(
   default_args = Dict(
       :name => "baseline",
      :output => :class,
      :strat => mode
   )
)

Baseline model that returns the mode during classification.

AMLPipelineBase.BaselineModels.Baseline — Method

Baseline(name::String,opt...)

Helper function

AMLPipelineBase.BaselineModels.Identity — Type

Identity(args=Dict())

Returns the input as output.

AMLPipelineBase.BaselineModels.Identity — Method

Identity(name::String,opt...)

Helper function

AMLPipelineBase.AbsTypes.fit! — Function

fit!(idy::Identity,x::DataFrame,y::Vector)

Does nothing.

AMLPipelineBase.AbsTypes.fit! — Method

fit!(bsl::Baseline,x::DataFrame,y::Vector)

Get the mode of the training data.

AMLPipelineBase.AbsTypes.transform! — Function

transform!(idy::Identity,x::DataFrame)

Return the input as output.

AMLPipelineBase.AbsTypes.transform! — Method

transform!(bsl::Baseline,x::DataFrame)

Return the mode in classification.

AMLPipelineBase.DecisionTreeLearners.Adaboost — Type

Adaboost(
  Dict(
    :output => :class,
    :num_iterations => 7
  )
)

Adaboosted decision tree stumps. See DecisionTree.jl's documentation

Hyperparameters:

:num_iterations => 7 (number of iterations of AdaBoost)

Implements fit!, transform!

AMLPipelineBase.DecisionTreeLearners.PrunedTree — Type

PrunedTree(
  Dict(
    :purity_threshold => 1.0,
    :max_depth => -1,
    :min_samples_leaf => 1,
    :min_samples_split => 2,
    :min_purity_increase => 0.0
  )
)

Decision tree classifier. See DecisionTree.jl's documentation

Hyperparmeters:

:purity_threshold => 1.0 (merge leaves having >=thresh combined purity)
:max_depth => -1 (maximum depth of the decision tree)
:min_samples_leaf => 1 (the minimum number of samples each leaf needs to have)
:min_samples_split => 2 (the minimum number of samples in needed for a split)
:min_purity_increase => 0.0 (minimum purity needed for a split)

Implements fit!, transform!

AMLPipelineBase.DecisionTreeLearners.RandomForest — Type

RandomForest(
  Dict(
    :output => :class,
    :num_subfeatures => 0,
    :num_trees => 10,
    :partial_sampling => 0.7,
    :max_depth => -1
  )
)

Random forest classification. See DecisionTree.jl's documentation

Hyperparmeters:

:num_subfeatures => 0 (number of features to consider at random per split)
:num_trees => 10 (number of trees to train)
:partial_sampling => 0.7 (fraction of samples to train each tree on)
:max_depth => -1 (maximum depth of the decision trees)
:min_samples_leaf => 1 (the minimum number of samples each leaf needs to have)
:min_samples_split => 2 (the minimum number of samples in needed for a split)
:min_purity_increase => 0.0 (minimum purity needed for a split)

Implements fit!, transform!

AMLPipelineBase.AbsTypes.fit! — Method

fit!(adaboost::Adaboost, features::DataFrame, labels::Vector)

Optimize the hyperparameters of Adaboost instance.

AMLPipelineBase.AbsTypes.fit! — Method

fit!(tree::PrunedTree, features::DataFrame, labels::Vector)

Optimize the hyperparameters of PrunedTree instance.

AMLPipelineBase.AbsTypes.fit! — Method

fit!(forest::RandomForest, features::DataFrame, labels::Vector)

Optimize the parameters of the RandomForest instance.

AMLPipelineBase.AbsTypes.transform! — Method

transform!(adaboost::Adaboost, features::DataFrame)

Predict using the optimized hyperparameters of the trained Adaboost instance.

AMLPipelineBase.AbsTypes.transform! — Method

transform!(ptree::PrunedTree, features::DataFrame)

Predict using the optimized hyperparameters of the trained PrunedTree instance.

AMLPipelineBase.AbsTypes.transform! — Method

transform!(forest::RandomForest, features::DataFrame)

Predict using the optimized hyperparameters of the trained RandomForest instance.