The BetaML.Trees Module
Missing docstring for Trees
. Check Documenter's build log for details.
Module Index
BetaML.Trees.DecisionNode
BetaML.Trees.Leaf
BetaML.Trees.Question
BetaML.Trees.buildTree
BetaML.Trees.findBestSplit
BetaML.Trees.infoGain
BetaML.Trees.match
BetaML.Trees.partition
BetaML.Trees.predict
BetaML.Trees.predict
BetaML.Trees.predictSingle
BetaML.Trees.predictSingle
Detailed API
Base.print
— Functionprint(node)
Print a Decision Tree (textual)
BetaML.Trees.buildTree
— FunctionbuildTree(x, y, depth; maxDepth, minGain, minRecords, maxFeatures, splittingCriterion, forceClassification)
Builds (define and train) a Decision Tree.
Given a dataset of features x
and the corresponding dataset of labels y
, recursivelly build a decision tree by finding at each node the best question to split the data untill either all the dataset is separated or a terminal condition is reached. The given tree is then returned.
Parameters:
x
: The dataset's features (N × D)y
: The dataset's labels (N × 1)depth
: The current tree's depth. Used when calling the function recursively [def:1
]maxDepth
: The maximum depth the tree is allowed to reach. When this is reached the node is forced to become a leaf [def:N
, i.e. no limits]minGain
: The minimum information gain to allow for a node's partition [def:0
]minRecords
: The minimum number of records a node must holds to consider for a partition of it [def:2
]maxFeatures
: The maximum number of (random) features to consider at each partitioning [def:D
, i.e. look at all features]splittingCriterion
: Eithergini
,entropy
orvariance
(seeinfoGain
) [def:gini
for categorical labels (classification task) andvariance
for numerical labels(regression task)]forceClassification
: Weather to force a classification task even if the labels are numerical (typically when labels are integers encoding some feature rather than representing a real cardinal measure) [def:false
]
Notes:
Missing data (in the feature dataset) are supported.
BetaML.Trees.predict
— Methodpredict(forest,x)
Predict the labels of a feature dataset.
For each record of the dataset and each tree of the "forest", recursivelly traverse the tree to find the prediction most opportune for the given record. If the labels the tree has been trained with are numeric, the prediction is also numeric (the mean of the different trees predictions, in turn the mean of the labels of the training records ended in that leaf node). If the labels were categorical, the prediction is a dictionary with the probabilities of each item (again the probabilities of the different trees are averaged to compose the forest predictions).
In the first case (numerical predictions) use meanRelError(ŷ,y)
to assess the mean relative error, in the second case you can use accuracy(ŷ,y)
.
BetaML.Trees.predict
— Methodpredict(tree,x)
Predict the labels of a feature dataset.
For each record of the dataset, recursivelly traverse the tree to find the prediction most opportune for the given record. If the labels the tree has been trained with are numeric, the prediction is also numeric. If the labels were categorical, the prediction is a dictionary with the probabilities of each item.
In the first case (numerical predictions) use meanRelError(ŷ,y)
to assess the mean relative error, in the second case you can use accuracy(ŷ,y)
.
BetaML.Trees.DecisionNode
— TypeDecisionNode(question,trueBranch,falseBranch, depth)
A tree's non-terminal node.
Constructor's arguments and struct members:
question
: The question asked in this nodetrueBranch
: A reference to the "true" branch of the treesfalseBranch
: A reference to the "false" branch of the treesdepth
: The nodes's depth in the tree
BetaML.Trees.Leaf
— TypeLeaf(y,depth)
A tree's leaf (terminal) node.
Constructor's arguments:
y
: The labels assorciated to each record (either numerical or categorical)depth
: The nodes's depth in the tree
Struct members:
rawPredictions
: Either the label's count or the numerical labels of the members of the nodepredictions
: Either the relative label's count (i.e. a PMF) or the meandepth
: The nodes's depth in the tree
BetaML.Trees.Question
— TypeQuestion
A question used to partition a dataset.
This struct just records a 'column number' and a 'column value' (e.g., Green).
BetaML.Trees.findBestSplit
— MethodfindBestSplit(x,y;maxFeatures,splittingCriterion)
Find the best possible split of the database.
Find the best question to ask by iterating over every feature / value and calculating the information gain.
Parameters:
x
: The feature datasety
: The labels datasetmaxFeatures
: Maximum number of (random) features to look up for the "best split"splittingCriterion
: The metric to define the "impurity" of the labels
BetaML.Trees.infoGain
— MethodinfoGain(left, right, parentUncertainty; splittingCriterion)
Compute the information gain of a specific partition.
Compare the "information gain" my measuring the difference betwwen the "impurity" of the labels of the parent node with those of the two child nodes, weighted by the respective number of items.
Parameters:
leftY
: Child #1 labelsrightY
: Child #2 labelsparentUncertainty
: "Impurity" of the labels of the parent nodesplittingCriterion
: Metrics to adopt to determine the "impurity" (see below)
Three "impurity" metrics are supported:
gini
(categorical)entropy
(categorical)variance
(numerical)
BetaML.Trees.match
— Methodmatch(question, x)
Return a dicotomic answer of a question when applied to a given feature record.
It compares the feature value in the given record to the value stored in the question. Numerical features are compared in terms of disequality (">="), while categorical features are compared in terms of equality ("==").
BetaML.Trees.partition
— Methodpartition(question,x)
Dicotomically partitions a dataset x
given a question.
For each row in the dataset, check if it matches the question. If so, add it to 'true rows', otherwise, add it to 'false rows'. Rows with missing values on the question column are assigned randomply proportionally to the assignment of the non-missing rows.
BetaML.Trees.predictSingle
— MethodpredictSingle(forest,x)
Predict the label of a single feature record. See predict
.
BetaML.Trees.predictSingle
— MethodpredictSingle(tree,x)
Predict the label of a single feature record. See predict
.