fairevaluate(classifiers, X, y; measures=nothing, measure=nothing, grp=:class, priv_grps, random_seed=12345, n_grps=6)

Performed paired t-test for each pair of classifier in classifiers and return p values and t statistics.


  • classifiers: Array of classifiers to compare
  • X: DataFrame with features and protected attribute
  • y: Binary Target Variable
  • measures=nothing: The measures to be evaluated and used for HypothesisTests. If this is not specified, the measure argument is used
  • measure=nothing: The performance/fairness measure used to perform hypothesis tests. If no values for measure is passed, then Disparate Impact will be used by default.
  • grp=:class: Protected Attribute Name
  • priv_grps=nothing: If default measure i.e. Disparate Impact is used, then pass an array of groups which are privileged in dataset.
  • random_seed=12345: Random seed to ensure reproducibility
  • n_grps=6: Number of folds for cross validation


A dictionary with following keys vs values is returned

  • measures: names of the measures
  • classifier_names: names of the classifiers. If a pipeline is used, it will show pipeline and associated number.
  • results: 3-dimensional array with evaluation result. Its size is measures x classifiers x fold_number.
  • pvalues: 3-dimensional array with pvalues for each pair of classifier. Its size is measures x classifiers x classifiers.
  • tstats:3-dimensional array with tstats for each pair of classifier. Its size is measures x classifiers x classifiers.