fatf.transparency.predictions.surrogate_explainers
.SurrogateTabularExplainer¶
-
class
fatf.transparency.predictions.surrogate_explainers.
SurrogateTabularExplainer
(dataset: numpy.ndarray, predictive_model: object, as_probabilistic: bool = True, as_regressor: bool = False, categorical_indices: Optional[List[Union[str, int]]] = None, class_names: Optional[List[str]] = None, classes_number: Optional[int] = None, feature_names: Optional[List[str]] = None, unique_predictions: Optional[List[Union[str, int]]] = None)[source]¶ An abstract parent class for implementing surrogate explainers.
Changed in version 0.1.0: Added support for regression models.
New in version 0.0.2.
An abstract class that all surrogate explainer classes should inherit from. It contains an
__init__
method and an input validator –_explain_instance_input_is_valid
– for the abstractexplain_instance
method. The validation of the input parameters passed to the__init__
method is done via thefatf.transparency.predictions.surrogate_explainers._input_is_valid
function.If the
predictive_model
is a non-probabilistic classifier (as_probabilistic=False
andas_regressor=False
), it is advised to specify bothclasses_number
andunique_predictions
parameters to ensure proper functionality of the explainer. Please see the respective parameter descriptions for more details.For detailed instruction how to build your own surrogate please see the How to build LIME yourself (bLIMEy) – Surrogate Tabular Explainers how-to guide.
Warning
The
_explain_instance_input_is_valid
method should be called in all implementations of theexplain_instance
method in the children classes to ensure that all of the input parameters passed to this method are valid.- Parameters
- datasetnumpy.ndarray
A 2-dimensional numpy array with a dataset (utilised in various ways throughout the explainer).
- predictive_modelobject
A pre-trained (black-box) predictive model to be explained. If
as_probabilistic
(see below) is set toTrue
, it must have apredict_proba
method that takes a data set as the only required input parameter and returns a 2-dimensional numpy array with probabilities of belonging to each class. Otherwise, ifas_probabilistic
is set toFalse
, thepredictive_model
must have apredict
method that outputs a 1-dimensional array with (class) predictions.- as_probabilisticboolean, optional (default=True)
A boolean indicating whether the global model is probabilistic. If
True
, thepredictive_model
must have apredict_proba
method. IfFalse
, thepredictive_model
must have apredict
method. This parameter is disregarded whenas_regressor=True
.- as_regressorboolean, optional (default=False)
New in version 0.1.0.
A boolean indicating whether the global model is regression. If
True
, thepredictive_model
must have apredict
method; theas_probabilistic
parameter is disregarded. IfFalse
, the model is treated as a classifier – seeas_probabilistic
parameter.- categorical_indicesList[column indices], optional (default=None)
A list of column indices in the input
dataset
that should be treated as categorical features.- class_namesList[string], optional (default=None)
A list of strings defining the names of classes. If the predictive model is probabilistic, the order of the class names should correspond to the order of columns output by the model. For other models the order should correspond to lexicographical ordering of all the possible outputs of this model. For example, if the model outputs
['a', 'c', '0']
the class names should be given for['0', 'a', 'c']
ordering. This parameter is disregarded whenas_regressor=True
.- classes_numberinteger, optional (default=None)
The unique number of classes modelled by the
predictive_model
. If the model is probabilistic, setting this parameter is not required as the number of classes will be inferred from the shape of the predicted probabilities array. For non-probabilistic models if this parameter is not given, this number will be inferred from the length of theclass_names
list if provided, otherwise the inputdataset
will be predicted and the unique values will be counter therein. Since the latter method cannot guarantee counting all the possible predictions, aUserWarning
will be emitted encouraging the user to specify the number of classes via theclasses_number
parameter. For non-probabilistic models it is advised to specify this parameter. This parameter is disregarded whenas_regressor=True
.- feature_namesList[string], optional (default=None)
A list of strings defining the names of the
dataset
features. The order of the names should correspond to the order of features in thedataset
.- unique_predictionsList[strings or integers], optional (default=None)
A complete list of unique predictions that the
predictive_model
can output. This parameter is only used when thepredictive_model
is a non-probabilistic classifier (as_probabilistic=False
). This parameter is disregarded whenas_regressor=True
.
- Attributes
- datasetnumpy.ndarray
A 2-dimensional numpy array with the input
dataset
.- is_structuredboolean
True
if thedataset
is a structured numpy array,False
otherwise.- column_indicesList[column indices]
A list of column indices in the order they appear in the
dataset
.- categorical_indicesList[column indices]
A list of column indices that should be treat as categorical features.
- numerical_indicesList[column indices]
A list of column indices that should be treat as numerical features.
- as_probabilisticboolean
True
if thepredictive_model
should be treated as probabilistic andFalse
if it should be treated as a classifier.- as_regressorboolean
True
if thepredictive_model
should be treated as regression andFalse
if it should be treated as a (probabilistic) classifier.- predictive_modelobject
A pre-trained (black-box) predictive model to be explained.
- predictive_functionCallable[[np.ndarray], np.ndarray]
A function that will be used to get predictions from the input predictive model. It references the
predictive_model.predict_proba
method for for probabilistic models (as_probabilistic=True
) and thepredictive_model.predict
method for non-probabilistic models.- classes_numberinteger
A number of unique classes that the
predictive_model
is trained to recognise.- class_namesList[string]
A list of strings defining the names of classes. If this was not specified by the user, the classes will be assigned names based on the following pattern: ‘class %d’. If the
predictive_model
is a classifier (as_probabilistic=False
) and the number of unique predictions is equal to the number of classes, theclass_names
will be lexicographically sorted list of the unique values output by thepredictive_model
.- feature_namesList[string]
A list of strings defining the names of the features. If this was not specified by the user, the features will be assigned names based on the following pattern: ‘feature %d’.
- unique_predictionsList[strings or integers] or None
None
for probabilisticpredictive_model
(as_probabilistic=True
) and a list of unique classes output by thepredictive_model
if it is non-probabilistic.
- Raises
- IncompatibleModelError
The
predictive_model
does not have the required functionality:predict_proba
method for probabilistic models andpredict
method for regressors and non-probabilistic classifiers.- IncorrectShapeError
The input
dataset
is not a 2-dimensional numpy array.- IndexError
Some of the column indices given in the
categorical_indices
parameter are not valid for the inputdataset
.- RuntimeError
The number of columns in the probabilistic matrix output by the
predictive_model
(whenas_probabilistic=True
) is different to the number of features specified by the user via thefeatures_number
parameter. Thepredictive_model
has output different unique classes to the ones specified by the user via theunique_predictions
parameter (checked only for non-probabilistic models, i.e.,as_probabilistic=False
). Either the user-specified or inferred number of unique predictions does not agree with the internal number of classes.- TypeError
The input
dataset
is not of a base (numerical and/or string) type. Theas_probabilistic
parameter is not a boolean. Theas_regressor
parameter is not a boolean. Thecategorical_indices
parameter is neither a list norNone
. Theclass_names
parameter is neither a list norNone
or one of the elements in this list is not a string. Theclasses_number
parameter is neither an integer norNone
. Thefeature_names
parameter is neither a list norNone
or one of the elements in this list is not a string. Theunique_predictions
parameter is neither a list norNone
or all the elements in this list are not of string or integer type.- ValueError
The
categorical_indices
list contains duplicated entries. The length of theclass_names
list is not equal to the detected or given number of classes, some of the entires in this list are duplicated or this list is empty. The length of thefeature_names
list is not the same as the number of features in the inputdataset
or some of the entries in that list are duplicated. Theclasses_number
parameter is smaller than 2. Theunique_predictions
list is empty or contains duplicated entries.
- Warns
- UserWarning
If some of the string-based columns in the input data array were not indicated to be categorical features by the user (via the
categorical_indices
parameter), the user is warned that they will be added to the list of categorical features. When theclasses_number
parameter is not specified for a non-probabilistic model and the number of classes cannot be inferred form the length of theclasses_names
list, the number of classes is computed as the unique number of predictions of thepredictive_model
for the inputdatasets
, which may not be accurate. The user is warned and advised to specify theclasses_number
parameter in this case. The user has provided unique predictions via theunique_predictions
parameter for a probabilistic model (as_probabilistic=True
), which are not needed and will be disregarded. The unique predictions had to be inferred from the predictions output by the (non-probabilistic)predictive_model
, therefore may be incomplete. It is advised to provide this list via theunique_predictions
parameter to ensure proper functionality of the explainer.
Methods
explain_instance
(data_row, numpy.void])Explains a
data_row
.-
explain_instance
(data_row: Union[numpy.ndarray, numpy.void]) → Any[source]¶ Explains a
data_row
.This is an abstract method that must be implemented for each child object of this abstract class.
Warning
The
_explain_instance_input_is_valid
method should be called in all implementations of theexplain_instance
method in the children classes to ensure that all of the input parameters passed to this method are valid.- Parameters
- data_rowUnion[numpy.ndarray, numpy.void]
A data point to be explained.
- Returns
- explanationAny
An explanation of the
data_row
.
- Raises
- IncorrectShapeError
The
data_row
is not a 1-dimensional numpy array-like object. The number of features (columns) in thedata_row
is different to the number of features in the data array used to initialise this object.- NotImplementedError
This is an abstract method, hence has not been implemented.
- TypeError
The dtype of the
data_row
is different than the dtype of the data array used to initialise this object.