fatf.transparency.sklearn.tools
.SKLearnExplainer¶
-
class
fatf.transparency.sklearn.tools.
SKLearnExplainer
(clf: sklearn.base.BaseEstimator, feature_names: Optional[List[str]] = None, class_names: Optional[List[str]] = None)[source]¶ Implements a base scikit-learn model explainer class.
New in version 0.0.2.
Every scikit-learn model explainer class should inherit from this class. It should also overwrite the following four private methods:
For their expected functionality please see their respective documentation.
The explainer should also implement one of the explanatory methods that are inherited from
SKLearnExplainer
’s parent class (fatf.utils.transparency.explainers.Explainer
):explain_model
, and/or
Alternatively, a new method that explains an aspect of the model or its predictions can be introduced.
This class loggs an information if the feature names were not given and are inferred from the provided number of features using “feature %d” pattern. An information is also logged if the class names were not given and are inferred from the provided class id’s (using the
classes_array
attribute) using “class %s” pattern.- Parameters
- clfsklearn.base.BaseEstimator
A scikit-learn model.
- feature_namesOptional[List[string]]
A list of strings representing feature names in order they appear in the numpy array used to train the
clf
predictive model.- class_namesOptional[List[string]]
A list of strings representing class names. The order of this list has to correspond to the lexicographical ordering of the unique values in the target (ground truth) array used to train the
clf
predictor. For example, if your target array has the following values['aa', 'a', '0', 'b']
, your class names should be given for the following ordering of the class id’s:['0', 'a', 'aa', 'b']
.
- Attributes
- clfsklearn.base.BaseEstimator
A fitted scikit-learn model.
- feature_namesUnion[None, List[string]]
Either
None
or a list of feature names in the order they appear in the numpy array used to train theclf
classifier.- class_namesUnion[None, List[string]]
Either
None
or a list of class names in the order of lexicographically sorted unique values in the target (ground truth) array used to train theclf
predictor (class id’s).- is_classifierboolean
If
True
, the predictive model held under theclf
attribute is a classifier. IfFalse
, it is a regressor. (Set using theclf
attribute via the_is_classifier
method.)- features_numberUnion[None, integer]
Either
None
or the number of features in theclf
model. (Extracted from theclf
attribute with the_get_features_number
method.)- classes_arrayUnion[None, numpy.ndarray]
Either
None
or a 1-dimensional numpy array holding all the possible model predictions (only for classifiers). For regressors this should always beNone
.
- Raises
- TypeError
The
clf
object is not a scikit-learn classifier – it does not inherit form thesklearn.base.BaseEstimator
.feature_names
parameter is neither a Python list norNone
. One of the elements of thefeature_names
list is not a string. Theclass_names
parameter is neither a Python list norNone
. One of the elements of theclass_names
list is not a string.- ValueError
Either the
feature_names
orclass_names
list is empty. The length of thefeature_names
list is different than the features number extracted from the classifier. The length of theclass_names
list is different than the length of theclasses_array
extracted from the classifier.
- Warns
- UserWarning
Features number is not given, therefore the length of the features name list cannot be validated. Classes array is not given, therefore the length of class names array cannot be validated.
Methods
Generates an explanation of a single data point (instance).
Generates a model explanation.
Computes feature importance.
map_class
(clf_class, str])Maps a class id output by the classifier to a class name.
-
_get_classes_array
() → Optional[numpy.ndarray][source]¶ Retrieves the array with classes that the predictive model can output.
For regressors this method must return
None
. For classifier it should return a 1-dimensional numpy array that holds all the possible classification results that the model can output if they are possible to extract formself.clf
orNone
otherwise.- Returns
- features_numberUnion[numpy.ndarray, None]
A 1-dimensional numpy array holding all the possible model predictions (only for classifiers) or
None
.
- Raises
- NotImplementedError
This error is always raised since the method is an abstract method.
-
_get_features_number
() → Optional[int][source]¶ Returns the number of features that the model accepts or
None
.If it is possible to extract the number of features (columns) expected by the
self.clf
predictor, this method should return this number. Otherwise, it must returnNone
.- Returns
- features_numberUnion[integer, None]
The number of features accepted by the classifier or
None
.
- Raises
- NotImplementedError
This error is always raised since the method is an abstract method.
-
_is_classifier
() → bool[source]¶ Indicates whether the
clf
model is a classifier or a regressor.This method should return
True
if the model that this class explains is a classifier andFalse
if it is a regressor.- Returns
- is_classifierboolean
True
if theself.clf
model is a classifier orFalse
when it is a regressor.
- Raises
- NotImplementedError
This error is always raised since the method is an abstract method.
-
_validate_kind_fitted
() → bool[source]¶ Implements a kind check and a fit check of a predictive model.
This method is called upon initialising the class and checks whether the
self.clf
predictor is of the right kind. For example, when implementing an explainer for scikit-learn linear models this method should check whether theself.clf
is a linear model and whether it has been fitted. If any of these conditions is not satisfied this method should raise an appropriate exception: for a wrong model type this should be aValueError
; for an unfit model this should be sklearn’ssklearn.exceptions.NotFittedError
(consider using scikit’ssklearn.utils.validation.check_is_fitted
function to raise this exception).- Returns
- is_validboolean
True
if the kind of theself.clf
model is correct and the model is fitted,False
otherwise.
- Raises
- NotImplementedError
This error is always raised since the method is an abstract method.
-
explain_instance
() → numpy.ndarray[source]¶ Generates an explanation of a single data point (instance).
This can be an explanation of a data point from a data set or of a prediction provided by a predictive model.
-
map_class
(clf_class: Union[int, str]) → str[source]¶ Maps a class id output by the classifier to a class name.
A mapping will only be provided if the class was initialised with class names or an array of possible predictions was extracted form the classifier.
- Parameters
- clf_classUnion[integer, string]
A class id output by the classifier.
- Returns
- mapped_classstring
A class name corresponding to the class id.
- Raises
- RuntimeError
The error is raised when trying to map a class for a regressor. It is also raised if the class was not sufficiently initialised, i.e., either
classes_array
orclass_names
attributes are missing.- TypeError
The
clf_class
parameter is neither integer nor string.- ValueError
Given
clf_class
is not one of the values that the classifier can output.