fatf.transparency.sklearn.tools.SKLearnExplainer

class fatf.transparency.sklearn.tools.SKLearnExplainer(clf: sklearn.base.BaseEstimator, feature_names: Optional[List[str]] = None, class_names: Optional[List[str]] = None)[source]

Implements a base scikit-learn model explainer class.

New in version 0.0.2.

Every scikit-learn model explainer class should inherit from this class. It should also overwrite the following four private methods:

For their expected functionality please see their respective documentation.

The explainer should also implement one of the explanatory methods that are inherited from SKLearnExplainer’s parent class (fatf.utils.transparency.explainers.Explainer):

Alternatively, a new method that explains an aspect of the model or its predictions can be introduced.

This class loggs an information if the feature names were not given and are inferred from the provided number of features using “feature %d” pattern. An information is also logged if the class names were not given and are inferred from the provided class id’s (using the classes_array attribute) using “class %s” pattern.

Parameters
clfsklearn.base.BaseEstimator

A scikit-learn model.

feature_namesOptional[List[string]]

A list of strings representing feature names in order they appear in the numpy array used to train the clf predictive model.

class_namesOptional[List[string]]

A list of strings representing class names. The order of this list has to correspond to the lexicographical ordering of the unique values in the target (ground truth) array used to train the clf predictor. For example, if your target array has the following values ['aa', 'a', '0', 'b'], your class names should be given for the following ordering of the class id’s: ['0', 'a', 'aa', 'b'].

Attributes
clfsklearn.base.BaseEstimator

A fitted scikit-learn model.

feature_namesUnion[None, List[string]]

Either None or a list of feature names in the order they appear in the numpy array used to train the clf classifier.

class_namesUnion[None, List[string]]

Either None or a list of class names in the order of lexicographically sorted unique values in the target (ground truth) array used to train the clf predictor (class id’s).

is_classifierboolean

If True, the predictive model held under the clf attribute is a classifier. If False, it is a regressor. (Set using the clf attribute via the _is_classifier method.)

features_numberUnion[None, integer]

Either None or the number of features in the clf model. (Extracted from the clf attribute with the _get_features_number method.)

classes_arrayUnion[None, numpy.ndarray]

Either None or a 1-dimensional numpy array holding all the possible model predictions (only for classifiers). For regressors this should always be None.

Raises
TypeError

The clf object is not a scikit-learn classifier – it does not inherit form the sklearn.base.BaseEstimator. feature_names parameter is neither a Python list nor None. One of the elements of the feature_names list is not a string. The class_names parameter is neither a Python list nor None. One of the elements of the class_names list is not a string.

ValueError

Either the feature_names or class_names list is empty. The length of the feature_names list is different than the features number extracted from the classifier. The length of the class_names list is different than the length of the classes_array extracted from the classifier.

Warns
UserWarning

Features number is not given, therefore the length of the features name list cannot be validated. Classes array is not given, therefore the length of class names array cannot be validated.

Methods

explain_instance()

Generates an explanation of a single data point (instance).

explain_model()

Generates a model explanation.

feature_importance()

Computes feature importance.

map_class(clf_class, str])

Maps a class id output by the classifier to a class name.

_get_classes_array() → Optional[numpy.ndarray][source]

Retrieves the array with classes that the predictive model can output.

For regressors this method must return None. For classifier it should return a 1-dimensional numpy array that holds all the possible classification results that the model can output if they are possible to extract form self.clf or None otherwise.

Returns
features_numberUnion[numpy.ndarray, None]

A 1-dimensional numpy array holding all the possible model predictions (only for classifiers) or None.

Raises
NotImplementedError

This error is always raised since the method is an abstract method.

_get_features_number() → Optional[int][source]

Returns the number of features that the model accepts or None.

If it is possible to extract the number of features (columns) expected by the self.clf predictor, this method should return this number. Otherwise, it must return None.

Returns
features_numberUnion[integer, None]

The number of features accepted by the classifier or None.

Raises
NotImplementedError

This error is always raised since the method is an abstract method.

_is_classifier() → bool[source]

Indicates whether the clf model is a classifier or a regressor.

This method should return True if the model that this class explains is a classifier and False if it is a regressor.

Returns
is_classifierboolean

True if the self.clf model is a classifier or False when it is a regressor.

Raises
NotImplementedError

This error is always raised since the method is an abstract method.

_validate_kind_fitted() → bool[source]

Implements a kind check and a fit check of a predictive model.

This method is called upon initialising the class and checks whether the self.clf predictor is of the right kind. For example, when implementing an explainer for scikit-learn linear models this method should check whether the self.clf is a linear model and whether it has been fitted. If any of these conditions is not satisfied this method should raise an appropriate exception: for a wrong model type this should be a ValueError; for an unfit model this should be sklearn’s sklearn.exceptions.NotFittedError (consider using scikit’s sklearn.utils.validation.check_is_fitted function to raise this exception).

Returns
is_validboolean

True if the kind of the self.clf model is correct and the model is fitted, False otherwise.

Raises
NotImplementedError

This error is always raised since the method is an abstract method.

explain_instance() → numpy.ndarray[source]

Generates an explanation of a single data point (instance).

This can be an explanation of a data point from a data set or of a prediction provided by a predictive model.

explain_model() → numpy.ndarray[source]

Generates a model explanation.

feature_importance() → numpy.ndarray[source]

Computes feature importance.

map_class(clf_class: Union[int, str]) → str[source]

Maps a class id output by the classifier to a class name.

A mapping will only be provided if the class was initialised with class names or an array of possible predictions was extracted form the classifier.

Parameters
clf_classUnion[integer, string]

A class id output by the classifier.

Returns
mapped_classstring

A class name corresponding to the class id.

Raises
RuntimeError

The error is raised when trying to map a class for a regressor. It is also raised if the class was not sufficiently initialised, i.e., either classes_array or class_names attributes are missing.

TypeError

The clf_class parameter is neither integer nor string.

ValueError

Given clf_class is not one of the values that the classifier can output.