The following list of milestones is to guide the core developers on the future direction of the package development. The list is by no means exhaustive and will be updated over time as the development progresses and new algorithms are proposed by the research community.

The list is algorithm- and feature-oriented as the goal of the package is to give the community access to a tool that has all the necessary functionality for FAT research and deployment.

Milestone 1 ✔

The first milestone is our first public release of the package – version 0.0.1. The following functionality should be available.




Data/ Features

  • Systemic Bias (disparate treatment labelling)

  • Sample size disparity (e.g., class imbalance)

  • Sampling bias

  • Data Density Checker

  • Data description


  • Group-based fairness (disparate impact)

  • Systematic performance bias

  • Partial dependence

  • Individual conditional expectation


  • Counterfactual fairness (disparate treatment)

  • Counterfactuals

  • Tabular LIME (wrapper)

Milestone 2

This will be the first major update of the package. The focus will be placed on the transparency module. Nevertheless, some additional fairness and accountability functionality will be implemented as well.




Data/ Features

  • k-anonymity

  • l-diversity

  • t-closeness


  • Additional fairness metrics (to be decided)

  • Background check

  • PD/ICE enhancements

  • Scikit-learn model explainers


  • Forestspy

  • Tree interpreter

  • Feature importance

  • Model reliance


  • Logical models counterfactual explainer for and their ensembles


  • Scikit-learn prediction explainers

  • Generalised local surrogates (bLIMEy)

  • bLIMEy LIME implementation for tabular, text and image data

  • Extra fairness metrics.

    • Implement additional group-based fairness metrics.

    • Implement threshold computation based on the selected group metric equality.

    • Implement Jupyter Notebook interactive plugins (widgets) to allow the community to play with the fairness concepts. (E.g., widgets similar to interactive figures in this Google blog post.

  • Merge the pull request with k-anonimity, l-diversity and t-closeness.

  • Implement Background Check.

  • PD and ICE enhancements (pull request).

    • 2-D implementation.

    • Implementation for classification and regression.

    • Improved visualisations.

  • Scikit-learn model explainers (cf. the reference implementation in the eli5 package).

    • Decision trees.

      • Feature importance.

      • Decision tree structure (tree plot).

    • Rule lists and sets (these can share a common representation with the trees).

      • Rule list structure (rule list in a text form).

    • Linear models.

      • Feature importance (coefficients).

    • K-means.

      • Prototypes.

        • Similarities between examples in a cluster that are correctly assigned to this clusetr.

      • Criticisms.

        • Similarities between examples in a cluster that are incorrectly assigned to this clusetr.

  • Implement ANCHOR.

  • Implement forestspy.

  • Implement Tree Interpreter.

    • “The global feature importance of random forests can be quantified by the total decrease in node impurity averaged over all trees of the ensemble (‘mean decrease impurity’).”

    • “We can use the difference between the mean value of data points in a parent node between that of a child node to approximate the contribution of this split…”

    • Interpreting random forests and Random forest interpretation with scikit-learn blog posts hold some useful information extracted from the “Interpreting random forests” paper by Ando Saabas.

  • Implement a variety of feature importance metrics.

    • Random forest feature (variable) importance (“Random Forests”, Leo Breiman, 2001). (Similar to permutation importance.)

    • XGboost feature importance.

      • Feature weight – the number of times a feature appears in a tree (ensemble).

      • Gain – the average gain of splits that use the feature.

      • Coverage – the average coverage (number of samples affected) of splits that use the feature.

    • Skater feature importance.

    • Prediction variance – mean absolute value of changes in predictions given perturbations in the data.

    • “Variable Importance Analysis: A Comprehensive Review”. Reliability Engineering & System Safety 142 (2015): 399-432; Wei, Pengfei, Zhenzhou Lu, and Jingwen Song.

    • Scikit-learn and eli5 permutation importance (a.k.a. Mean Decrease Accuracy (MDA)).

      • eli5 implementation.

      • (These may be sensitive to features being correlated – a user guide note should suffice.)

  • Implement model reliance (Fisher, 2018). (“All Models are Wrong but many are Useful: Variable Importance for Black-Box, Proprietary, or Misspecified Prediction Models, using Model Class Reliance”, Aaron Fisher, Cynthia Rudin, Francesca Dominici.)

  • Implement TREPAN (tree surrogate).

    • “Extracting Comprehensible Models From Trained Neural Networks”, Mark W. Craven(1996). (PhD thesis)

    • “Extracting Thee-Structured Representations of Trained Networks”, Mark W. Craven and Jude W. Shavlik (NIPS, 96). (NIPS paper)

    • “Study of Various Decision Tree Pruning Methods with their Empirical Comparison in WEKA”, Nikita Patel and Saurabh Upadhyay (2012). (report)

    • TREPAN implementation in Skater.

  • Implement a counterfactual explainer for logical models and their ensembles.

  • Scikit-learn prediction explainers.

    • Decision trees.

      • Root-to-leaf path (logical conditions).

      • Counterfactuals.

    • Rule lists and sets.

      • Logical conditions list (as text).

    • Neighbours.

      • Similarities and differences (on the feature vector) among the neighbours of the same and the opposite class.

    • K-means.

      • Prototypes.

        • Nearest centroid of the same class.

      • Criticisms.

        • Nearest centroid of the opposite class.

  • bLIMEy implementation.

  • Fresh LIME implementation.

    • Write tutorials similar to LIME tutorials, in particular this tutorial.

    • Have a look at what eli5 does: “eli5.lime provides dataset generation utilities for text data (remove random words) and for arbitrary data (sampling using Kernel Density Estimation) … for explaining predictions of probabilistic classifiers eli5 uses another classifier by default, trained using cross-entropy loss, while canonical library fits regression model on probability output.”

Milestone 3

The third milestone will integrate the tool with important machine learning and fairness packages.




Data/ Features


  • Fairness360 integration

  • Distribution shift detection

  • Calibration

  • SHAP package integration (Shapley sampling values & Shapley regression values)

  • Xgboost package interpreter

  • LightGBM package interpreter

  • Lightning package interpreter

  • Sklearn-crfsuite package interpreter

  • eli5 package integration

  • Bayesian Rule Lists (BRL)

  • PD/ICE speed improvements

  • Interactive (JS) Jupyter Notebook plots


  • SHAP package integration

  • Xgboost package interpreter

  • Integration or reimplementation of fairness360 package (depending on the

  • code quality).

  • Implement distribution shift metrics.

  • Implement calibration techniques.

  • Integration with the SHAP package.

  • Explainers for models implemented in the Xgboost package.

  • Explainers for models implemented in the LightGBM package.

  • Explainers for models implemented in the Lightning package.

  • Explainers for models implemented in the sklearn-crfsuite package.

  • eli5 integration. (“Text processing utilities from scikit-learn and can highlight text data accordingly. Pipeline and FeatureUnion are supported. It also allows to debug scikit-learn pipelines which contain HashingVectorizer, by undoing hashing.”)

  • Implement Bayesian Rule Lists (BRL).

  • PD/ICE speed improvements – parallelisation and a progress bar.

  • iPython/Jupyter Notebook interactive (JS) plots to improve research applicability aspect of the package.

Milestone 4

This milestone is focused on implementing in the package a collection of tools that will enable researchers and practitioners to use it with (deep) neural networks (Deep Learning, autograd, optimisation).




Data/ Features


  • what-if tool integration

  • Quantitative Input influence (QII)

  • Layer-wise Relevance Propagation (e-LRP)

  • Occlusion

  • integrated gradient

  • what-if tool integration


  • DeepLIFT (example explanation)

  • DeepExplain

  • Integration with the what-if tool.

  • Implement Quantitative Input influence (QII).

  • Implement epsilon-Layer-wise Relevance Propagation (e-LRP).

    • “On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation”, Bach S, Binder A, Montavon G, Klauschen F, Muller K-R, Samek W (2015).

    • “Towards better understanding of gradient-based attribution methods for Deep Neural Networks”, Ancona M, Ceolini E, Oztireli C, Gross M (ICLR, 2018).

  • Implement occlusion.

    • “Visualizing and understanding convolutional networks”, Zeiler, M and Fergus, R (Springer, 2014).

    • Occlusion implementation.

  • Implement Integrated Gradient method.

  • Implement the DeepLIFT algorithm.

  • Implement the DeepExplain algorithm.

    • “Towards better understanding of gradient-based attribution methods for Deep Neural Networks”, Ancona M, Ceolini E, Oztireli C, Gross M (ICLR, 2018).

    • DeepExplain implementation.

  • Finalise full integration of Skater and SHAP (deep neural netowrks).