fatf.transparency.predictions.surrogate_explainers
.TabularBlimeyLime¶
-
class
fatf.transparency.predictions.surrogate_explainers.
TabularBlimeyLime
(dataset: numpy.ndarray, predictive_model: object, as_regressor: bool = False, categorical_indices: Optional[List[Union[str, int]]] = None, class_names: Optional[List[str]] = None, feature_names: Optional[List[str]] = None)[source]¶ A tabular LIME explainer – a surrogate explainer based on a linear model.
Changed in version 0.1.0: (1) Added support for regression models. (2) Changed the feature selection mechanism from k-LASSO to
forward_selection
when the number of selected features is less than 7, andhighest_weights
otherwise – the default LIME behaviour.New in version 0.0.2.
This class implements Local Interpretable Model-agnostic Explanations (LIME) introduced by [RIBEIRO2016WHY]. This implementation mirrors the one in the official LIME package, which is available under the
lime.lime_tabular.LimeTabularExplainer
class therein.This explainer uses a quartile discretiser (
fatf.utils.data.discretisation.QuartileDiscretiser
) and a normal sampler (fatf.utils.data.augmentation.NormalSampling
) for augmenting the data. The following steps are taken to generate the explanation (when theexplain_instance
method is called):The input
data_row
is discretised using the quartile discretiser. The numerical features are binned and the categorical ones are left unchanged (selected via thecategorical_indices
parameter).The data are sampled around the discretised
data_row
using the normal sampler. Since after the discretisation all of the features are categorical the bin indices are sampled based on their frequency in (the discretised version of) thedataset
used to initialise this class.The sampled data are reverted back to their original domain and predicted with the black-box model (
predictive_model
used to initialise this class). This step is done via sampling each (numerical) feature value from the corresponding bin using the truncated normal distribution for which minimum (lower threshold), maximum (upper threshold), mean and standard deviation are computed empirically from all the data points from thedataset
for which feature values fall into that bin. The categorical features are left unchanged.The discretised sampled data set is binarised by comparing each row with the user-specified
data_row
(in theexplain_instance
method). This step is performed by taking XNOR logical operation between the two – 1 if the feature value is the same in a row of the discretised data set and thedata_row
and 0 if it is different.The Euclidean distance between the binarised sampled data and binarised
data_row
is computed and passed through an exponential kernel (fatf.utils.kernels.exponential_kernel
) to get similarity scores, which will be used as data point weights when reducing the number of features (see below) and training the linear regression.To limit the number of features in the explanation (if enabled by the user) we either use forward selection when the number of selected features is less than 7 or highest weights otherwise. (This is controlled by the
features_number
parameter in theexplain_instance
method and by default –features_number=None
– all of the feature are used.)A local (weighted) ridge regression (
sklearn.linear_model.Ridge
) is fitted to the sampled and binarised data with the target being:The numerical predictions of the black-box model when the underlying model is a regression.
A vector of probabilities output by the black-box model for the selected class (one-vs-rest) when the underlying model is a probabilistic classifier. By default, one model is trained for all of the classes (
explained_class=None
in theexplain_instance
method), however the class to be explained can be specified by the user.
Note
How to interpret the results?
Because the local surrogate model is trained on the binarised sampled data that is parsed through the XNOR operation, the parameters extracted from this model (feature importances) should be interpreted as an answer to the following question:
“Had this particular feature value of the explained data point been outside of this range (for numerical features) or had a different value (for categorical feature), how would that influence the probability of this point belonging to the explained class (probabilistic classification) / predicted numerical value (regression)?”
This LIME implementation is limited to black-box probabilistic classifiers and regressors (similarly to the official implementation). Therefore, the
predictive_model
must have apredict_proba
method for probabilistic models andpredict
method for regressors. When the surrogate is built for a probabilistic classifier, the local model will be trained using the one-vs-rest approach since the output of the global model is an array with probabilities of each class (the classes to be explained can be selected using theexplained_class
parameter in theexplain_instance
method). The column indices indicated as categorical features (via thecategorical_indices
parameter) will not be discretised.For detailed instructions on how to build a custom surrogate explainer (to avoid tinkering with this class) please see the How to build LIME yourself (bLIMEy) – Surrogate Tabular Explainers how-to guide.
For additional parameters, warnings and errors description please see the documentation of the parent class
fatf.transparency.predictions.surrogate_explainers.SurrogateTabularExplainer
.- RIBEIRO2016WHY
Ribeiro, M.T., Singh, S. and Guestrin, C., 2016, August. Why should I trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135-1144). ACM.
- Parameters
- datasetnumpy.ndarray
A 2-dimensional numpy array with a dataset (utilised in various ways throughout the explainer).
- predictive_modelobject
A pre-trained (black-box) predictive model to be explained. If
as_probabilistic
(see below) is set toTrue
, it must have apredict_proba
method that takes a data set as the only required input parameter and returns a 2-dimensional numpy array with probabilities of belonging to each class. Otherwise, ifas_probabilistic
is set toFalse
, thepredictive_model
must have apredict
method that outputs a 1-dimensional array with (class) predictions.- as_regressorboolean, optional (default=False)
New in version 0.1.0.
A boolean indicating whether the global model should be treated as regression (
True
) or probabilistic classification (False
).- categorical_indicesList[column indices], optional (default=None)
A list of column indices in the input
dataset
that should be treated as categorical features.- class_namesList[string], optional (default=None)
A list of strings defining the names of classes. If the predictive model is probabilistic, the order of the class names should correspond to the order of columns output by the model. For other models the order should correspond to lexicographical ordering of all the possible outputs of this model. For example, if the model outputs
['a', 'c', '0']
the class names should be given for['0', 'a', 'c']
ordering.- feature_namesList[string], optional (default=None)
A list of strings defining the names of the
dataset
features. The order of the names should correspond to the order of features in thedataset
.
- Attributes
- discretiserfatf.utils.data.discretisation.Discretiser
An instance of the quartile discretiser (
fatf.utils.data.discretisation.QuartileDiscretiser
) initialised with the inputdataset
and used to discretise thedata_row
when theexplain_instance
method is called.- augmenterfatf.utils.data.augmentation.Augmentation
An instance of the normal sampling augmenter (
fatf.utils.data.augmentation.NormalSampling
) used to sample new data points around the discretiseddata_row
(in theexplain_instance
method).- bin_sampling_valuesDictionary[dataset column index, Dictionary[discretised bin id, Tuple(float, float, float, float)]]
A dictionary holding characteristics for each bin of each numerical feature. The characteristic are represented as a 4-tuple consisting of: the lower bin boundary, the upper bin boundary, the empirical mean of of all the values of this feature for data points (in
dataset
) falling into that bin, and the empirical standard deviation (calculated in the same way). For the edge bins, if there are data available the lower edge is calculated empirically (as the minimum of the corresponding feature values falling into that bin), otherwise it is set to-numpy.inf
. The same applies to the upper edge, which is either set tonumpy.inf
or calculated empirically (as the maximum of the corresponding feature values falling into that bin). If there are no data points to calculate the mean and standard deviation for a given bin, these two values are set tonumpy.nan
. (This does not influence the future reverse sampling, for which this attribute is used: since there were no data for a given bin, the frequency of data for that bin is 0, therefore no data falling into this bin will be sampled.)
- Raises
- ImportError
The scikit-learn package is missing.
Methods
explain_instance
(data_row, numpy.void], …)Explains the
data_row
with linear regression feature importance.-
explain_instance
(data_row: Union[numpy.ndarray, numpy.void], explained_class: Union[int, str, None] = None, samples_number: int = 50, features_number: Optional[int] = None, kernel_width: Optional[float] = None, return_models: bool = False) → Union[Dict[str, Dict[str, float]], Tuple[Dict[str, Dict[str, float]], Union[Dict[str, fatf.utils.models.models.Model], fatf.utils.models.models.Model]]][source]¶ Explains the
data_row
with linear regression feature importance.Changed in version 0.1.0: Changed the feature selection mechanism from k-LASSO to
forward_selection
when the number of selected features is less than 7, andhighest_weights
otherwise – the default LIME behaviour.For probabilistic classifiers the explanations will be produced for all of the classes by default. This can be changed by selecting a specific class with the
explained_class
parameter.The default
kernel_width
is computed as the square root of the number of features multiplied by 0.75. Also, by default, all of the (interpretable) features will be used to create an explanation, which can be limited by setting thefeatures_number
parameter. The data sampling around thedata_row
can be customised by specifying the number of points to be generated (samples_number
).By default, this method only returns feature importance, however by setting
return_models
toTrue
, it will also return the local linear surrogates for further analysis and processing done outside of this method.Note
The exact description of the explanation generation procedure can be found in the documentation of this class (
fatf.transparency.predictions.surrogate_explainers.TabularBlimeyLime
).For additional parameters, warnings and errors please see the parent class method
fatf.transparency.predictions.surrogate_explainers.SurrogateTabularExplainer.explain_instance
.- Parameters
- data_rowUnion[numpy.ndarray, numpy.void]
A data point to be explained (1-dimensional numpy array).
- explained_classUnion[integer, string], optional (default=None)
The class to be explained – only applicable to probabilistic classifiers. If
None
, all of the classes will be explained. This can either be the index of the class (the column index of the probabilistic vector) or the class name (taken fromself.class_names
).- samples_numberinteger, optional (default=50)
The number of data points sampled from the normal augmenter, which will be used to fit the local surrogate model.
- features_numberinteger, optional (default=None)
The maximum number of (interpretable) features – found with forward selection or highest weights – to be used in the explanation (the local surrogate model is trained with this feature subset). By default (
None
), all of the (interpretable) features are used.- kernel_widthfloat, optional (default=None)
The width of the exponential kernel used when computing weights of the sampled data based on the distances between the sampled data and the
data_row
.The defaultkernel_width
(kernel_width=None
) is computed as the square root of the number of features multiplied by 0.75.- return_modelsboolean, optional (default=False)
If
True
, this method will return both the feature importance explanation dictionary and a dictionary holding the local models. Otherwise, only the first dictionary will be returned.
- Returns
- explanationsDictionary[string, Dictionary[string, float]]
A dictionary holding dictionaries that contain feature importance – where the feature names are taken from
self.feature_names
and the feature importances are extracted from local linear surrogates. These dictionaries are held under keys corresponding to class names (taken fromself.class_names
).- modelssklearn.linear_model.base.LinearModel, optional
A dictionary holding locally fitted surrogate linear models held under class name keys (taken from
self.class_names
). This dictionary is only returned when thereturn_models
parameter is set toTrue
.
- Raises
- TypeError
The
explained_class
parameter is neitherNone
, an integer or a string. Thesamples_number
parameter is not an integer. Thefeatures_number
parameter is neitherNone
nor an integer. Thekernel_width
parameter is neitherNone
nor a number. Thereturn_models
parameter is not a boolean.- ValueError
The
samples_number
parameter is a non-positive integer (smaller than 1). Thefeatures_number
parameter is a non-positive integer (smaller than 1). Thekernel_width
parameter is a non-positive number (smaller or equal to 0). Theexplained_class
specified by the user could neither be recognised as one of the allowed class names (self.class_names
) nor an index of a class name.