Modal Decision Trees & Forests

Stable Dev Build Status Coverage Binder

Interpretable models for native time-series & image classification!

This package provides algorithms for learning decision trees and decision forests with enhanced abilities. Leveraging the express power of Modal Logic, these models can extract temporal/spatial patterns, and can natively handle time series and images (without any data preprocessing). Currently available via MLJ.jl and Sole.jl.

Features & differences with DecisionTree.jl:

The MLJ models provided (ModalDecisionTree and ModalRandomForest) can act as drop in replacements for DecisionTree.jl's tree and forest models. The main difference is that the two models provided are probabilistic and can perform both classification (with y labels of type String or CategoricalValue), and regression (with numeric y labels).

Additionally, these models:

  • Are able to handle variables that are AbstractVector{<:Real} or AbstractMatrix{<:Real};
  • Support multimodal learning (e.g., learning from combinations of scalars, time series and images);
  • A unique algorithm that extends CART and C4.5;

Current limitations (also see TODOs):

  • Only supports numeric features;
  • Does not support missing or NaN values.

JuliaCon 2022 8-minute talk

Installation & Usage

Simply type the following commands in Julia's REPL:

# Install package
using Pkg; Pkg.add("MLJ");
using Pkg; Pkg.add("ModalDecisionTrees");

# Import packages
using MLJ
using ModalDecisionTrees
using Random

# Load an example dataset (a temporal one)
X, y = ModalDecisionTrees.load_japanesevowels()
N = length(y)

# Instantiate an MLJ machine based on a Modal Decision Tree with ≥ 4 samples at leaf
mach = machine(ModalDecisionTree(min_samples_leaf=4), X, y)

# Split dataset
p = randperm(N)
train_idxs, test_idxs = p[1:round(Int, N*.8)], p[round(Int, N*.8)+1:end]

# Fit
fit!(mach, rows=train_idxs)

# Perform predictions, compute accuracy
yhat = predict_mode(mach, X[test_idxs,:])
accuracy = MLJ.accuracy(yhat, y[test_idxs])

# Print model
report(mach).printmodel(3)

# Access raw model
model = fitted_params(mach).model

Theoretical foundations

Most of the works in symbolic learning are based either on Propositional Logics (PLs) or First-order Logics (FOLs); PLs are the simplest kind of logic and can only handle tabular data, while FOLs can express complex entity-relation concepts. Machine Learning with FOLs enables handling data with complex topologies, such as time series, images, or videos; however, these logics are computationally challenging. Instead, Modal Logics (e.g. Interval Logic) represent a perfect trade-off in terms of computational tractability and expressive power, and naturally lend themselves for expressing some forms of temporal/spatial reasoning.

Recently, symbolic learning techniques such as Decision Trees, Random Forests and Rule-Based models have been extended to the use of Modal Logics of time and space. Modal Decision Trees and Modal Random Forests have been applied to classification tasks, showing statistical performances that are often comparable to those of functional methods (e.g., neural networks), while providing, at the same time, highly-interpretable classification models. Examples of these tasks are COVID-19 diagnosis from cough/breath audio [1], [2], land cover classification from aereal images [3], EEG-related tasks [4], and gas turbine trip prediction. This technology also offers a natural extension for multimodal learning [5].

Credits

ModalDecisionTrees.jl lives within the Sole.jl framework for symbolic machine learning.

The package is developed by the ACLAI Lab @ University of Ferrara.

Thanks to Ben Sadeghi (@bensadeghi), author of DecisionTree.jl, which inspired the construction of this package.