fatf.transparency.predictions.surrogate_explainers.SurrogateTabularExplainer¶
-
class
fatf.transparency.predictions.surrogate_explainers.SurrogateTabularExplainer(dataset: numpy.ndarray, predictive_model: object, as_probabilistic: bool = True, as_regressor: bool = False, categorical_indices: Optional[List[Union[str, int]]] = None, class_names: Optional[List[str]] = None, classes_number: Optional[int] = None, feature_names: Optional[List[str]] = None, unique_predictions: Optional[List[Union[str, int]]] = None)[source]¶ An abstract parent class for implementing surrogate explainers.
Changed in version 0.1.0: Added support for regression models.
New in version 0.0.2.
An abstract class that all surrogate explainer classes should inherit from. It contains an
__init__method and an input validator –_explain_instance_input_is_valid– for the abstractexplain_instancemethod. The validation of the input parameters passed to the__init__method is done via thefatf.transparency.predictions.surrogate_explainers._input_is_validfunction.If the
predictive_modelis a non-probabilistic classifier (as_probabilistic=Falseandas_regressor=False), it is advised to specify bothclasses_numberandunique_predictionsparameters to ensure proper functionality of the explainer. Please see the respective parameter descriptions for more details.For detailed instruction how to build your own surrogate please see the How to build LIME yourself (bLIMEy) – Surrogate Tabular Explainers how-to guide.
Warning
The
_explain_instance_input_is_validmethod should be called in all implementations of theexplain_instancemethod in the children classes to ensure that all of the input parameters passed to this method are valid.- Parameters
- datasetnumpy.ndarray
A 2-dimensional numpy array with a dataset (utilised in various ways throughout the explainer).
- predictive_modelobject
A pre-trained (black-box) predictive model to be explained. If
as_probabilistic(see below) is set toTrue, it must have apredict_probamethod that takes a data set as the only required input parameter and returns a 2-dimensional numpy array with probabilities of belonging to each class. Otherwise, ifas_probabilisticis set toFalse, thepredictive_modelmust have apredictmethod that outputs a 1-dimensional array with (class) predictions.- as_probabilisticboolean, optional (default=True)
A boolean indicating whether the global model is probabilistic. If
True, thepredictive_modelmust have apredict_probamethod. IfFalse, thepredictive_modelmust have apredictmethod. This parameter is disregarded whenas_regressor=True.- as_regressorboolean, optional (default=False)
New in version 0.1.0.
A boolean indicating whether the global model is regression. If
True, thepredictive_modelmust have apredictmethod; theas_probabilisticparameter is disregarded. IfFalse, the model is treated as a classifier – seeas_probabilisticparameter.- categorical_indicesList[column indices], optional (default=None)
A list of column indices in the input
datasetthat should be treated as categorical features.- class_namesList[string], optional (default=None)
A list of strings defining the names of classes. If the predictive model is probabilistic, the order of the class names should correspond to the order of columns output by the model. For other models the order should correspond to lexicographical ordering of all the possible outputs of this model. For example, if the model outputs
['a', 'c', '0']the class names should be given for['0', 'a', 'c']ordering. This parameter is disregarded whenas_regressor=True.- classes_numberinteger, optional (default=None)
The unique number of classes modelled by the
predictive_model. If the model is probabilistic, setting this parameter is not required as the number of classes will be inferred from the shape of the predicted probabilities array. For non-probabilistic models if this parameter is not given, this number will be inferred from the length of theclass_nameslist if provided, otherwise the inputdatasetwill be predicted and the unique values will be counter therein. Since the latter method cannot guarantee counting all the possible predictions, aUserWarningwill be emitted encouraging the user to specify the number of classes via theclasses_numberparameter. For non-probabilistic models it is advised to specify this parameter. This parameter is disregarded whenas_regressor=True.- feature_namesList[string], optional (default=None)
A list of strings defining the names of the
datasetfeatures. The order of the names should correspond to the order of features in thedataset.- unique_predictionsList[strings or integers], optional (default=None)
A complete list of unique predictions that the
predictive_modelcan output. This parameter is only used when thepredictive_modelis a non-probabilistic classifier (as_probabilistic=False). This parameter is disregarded whenas_regressor=True.
- Attributes
- datasetnumpy.ndarray
A 2-dimensional numpy array with the input
dataset.- is_structuredboolean
Trueif thedatasetis a structured numpy array,Falseotherwise.- column_indicesList[column indices]
A list of column indices in the order they appear in the
dataset.- categorical_indicesList[column indices]
A list of column indices that should be treat as categorical features.
- numerical_indicesList[column indices]
A list of column indices that should be treat as numerical features.
- as_probabilisticboolean
Trueif thepredictive_modelshould be treated as probabilistic andFalseif it should be treated as a classifier.- as_regressorboolean
Trueif thepredictive_modelshould be treated as regression andFalseif it should be treated as a (probabilistic) classifier.- predictive_modelobject
A pre-trained (black-box) predictive model to be explained.
- predictive_functionCallable[[np.ndarray], np.ndarray]
A function that will be used to get predictions from the input predictive model. It references the
predictive_model.predict_probamethod for for probabilistic models (as_probabilistic=True) and thepredictive_model.predictmethod for non-probabilistic models.- classes_numberinteger
A number of unique classes that the
predictive_modelis trained to recognise.- class_namesList[string]
A list of strings defining the names of classes. If this was not specified by the user, the classes will be assigned names based on the following pattern: ‘class %d’. If the
predictive_modelis a classifier (as_probabilistic=False) and the number of unique predictions is equal to the number of classes, theclass_nameswill be lexicographically sorted list of the unique values output by thepredictive_model.- feature_namesList[string]
A list of strings defining the names of the features. If this was not specified by the user, the features will be assigned names based on the following pattern: ‘feature %d’.
- unique_predictionsList[strings or integers] or None
Nonefor probabilisticpredictive_model(as_probabilistic=True) and a list of unique classes output by thepredictive_modelif it is non-probabilistic.
- Raises
- IncompatibleModelError
The
predictive_modeldoes not have the required functionality:predict_probamethod for probabilistic models andpredictmethod for regressors and non-probabilistic classifiers.- IncorrectShapeError
The input
datasetis not a 2-dimensional numpy array.- IndexError
Some of the column indices given in the
categorical_indicesparameter are not valid for the inputdataset.- RuntimeError
The number of columns in the probabilistic matrix output by the
predictive_model(whenas_probabilistic=True) is different to the number of features specified by the user via thefeatures_numberparameter. Thepredictive_modelhas output different unique classes to the ones specified by the user via theunique_predictionsparameter (checked only for non-probabilistic models, i.e.,as_probabilistic=False). Either the user-specified or inferred number of unique predictions does not agree with the internal number of classes.- TypeError
The input
datasetis not of a base (numerical and/or string) type. Theas_probabilisticparameter is not a boolean. Theas_regressorparameter is not a boolean. Thecategorical_indicesparameter is neither a list norNone. Theclass_namesparameter is neither a list norNoneor one of the elements in this list is not a string. Theclasses_numberparameter is neither an integer norNone. Thefeature_namesparameter is neither a list norNoneor one of the elements in this list is not a string. Theunique_predictionsparameter is neither a list norNoneor all the elements in this list are not of string or integer type.- ValueError
The
categorical_indiceslist contains duplicated entries. The length of theclass_nameslist is not equal to the detected or given number of classes, some of the entires in this list are duplicated or this list is empty. The length of thefeature_nameslist is not the same as the number of features in the inputdatasetor some of the entries in that list are duplicated. Theclasses_numberparameter is smaller than 2. Theunique_predictionslist is empty or contains duplicated entries.
- Warns
- UserWarning
If some of the string-based columns in the input data array were not indicated to be categorical features by the user (via the
categorical_indicesparameter), the user is warned that they will be added to the list of categorical features. When theclasses_numberparameter is not specified for a non-probabilistic model and the number of classes cannot be inferred form the length of theclasses_nameslist, the number of classes is computed as the unique number of predictions of thepredictive_modelfor the inputdatasets, which may not be accurate. The user is warned and advised to specify theclasses_numberparameter in this case. The user has provided unique predictions via theunique_predictionsparameter for a probabilistic model (as_probabilistic=True), which are not needed and will be disregarded. The unique predictions had to be inferred from the predictions output by the (non-probabilistic)predictive_model, therefore may be incomplete. It is advised to provide this list via theunique_predictionsparameter to ensure proper functionality of the explainer.
Methods
explain_instance(data_row, numpy.void])Explains a
data_row.-
explain_instance(data_row: Union[numpy.ndarray, numpy.void]) → Any[source]¶ Explains a
data_row.This is an abstract method that must be implemented for each child object of this abstract class.
Warning
The
_explain_instance_input_is_validmethod should be called in all implementations of theexplain_instancemethod in the children classes to ensure that all of the input parameters passed to this method are valid.- Parameters
- data_rowUnion[numpy.ndarray, numpy.void]
A data point to be explained.
- Returns
- explanationAny
An explanation of the
data_row.
- Raises
- IncorrectShapeError
The
data_rowis not a 1-dimensional numpy array-like object. The number of features (columns) in thedata_rowis different to the number of features in the data array used to initialise this object.- NotImplementedError
This is an abstract method, hence has not been implemented.
- TypeError
The dtype of the
data_rowis different than the dtype of the data array used to initialise this object.