fatf.transparency.predictions.counterfactuals
.CounterfactualExplainer¶
-
class
fatf.transparency.predictions.counterfactuals.
CounterfactualExplainer
(model: Optional[object] = None, predictive_function: Optional[Callable] = None, dataset: Optional[numpy.ndarray] = None, categorical_indices: Optional[List[Union[str, int]]] = None, numerical_indices: Optional[List[Union[str, int]]] = None, counterfactual_feature_indices: Optional[List[Union[str, int]]] = None, max_counterfactual_length: int = 2, feature_ranges: Optional[Dict[Union[int, str], Union[Tuple[float, float], List[Union[float, str]]]]] = None, distance_functions: Optional[Dict[Union[int, str], Callable]] = None, step_sizes: Optional[Dict[Union[int, str], float]] = None, default_numerical_step_size: float = 1.0)[source]¶ Generates counterfactual explanations of black-box classifier predictions.
Finds counterfactual explanations for an arbitrary black-box classifier by a brute-force grid search with a specified step size for a selected range of values for selected features in the dataset.
In order to generate the counterfactuals either a
model
or apredictive_function
must be given.If a
dataset
is given, then only one of the two column indices parameters is required:categorical_indices
ornumerical_indices
. If a dataset is not given, both these parameters need to be specified.If only some of the features are desired to appear in the counterfactuals, then these may be specified in the
counterfactual_feature_indices
parameter.Note
Valid feature (column) indices are either strings for structured arrays or integers for normal numpy arrays.
The user may also wish to restrict the length of any generated counterfactual by providing the
max_counterfactual_length
parameter.If a
dataset
is given, then the feature ranges will be obtained by taking the minimum and the maximum value for numerical features and all the unique values for categorical (textual) features. Alternatively, the user may wish to define feature ranges by using thefeature_ranges
parameter. All undefined feature ranges will be filled in automatically given that adataset
is provided. If some of the feature ranges are not defined and adataset
is not given an exception will be raised. Counterfactuals will only be search for within these feature ranges. Ranges are only required for features specified by thecounterfactual_feature_indices
parameter or all features if this parameter is not given.For a given feature combination only the counterfactual(s) closest to the specified data point will be retrieved. By default the distance for every numerical feature is taken to be \(abs(x_i - \hat{x}_i)\) and for every categorical feature it is an identity function, i.e. \(1\) if the value does not agree and \(0\) if it agrees. If custom distance functions are desired, the user may specify these via the
distance_functions
parameter. Each distance function has to be aCallable
with two input parameters. Finally, the distance can be normalised, please see the documentation of the_get_distance
method for details.Last but not least, when doing grid search through the features to discover counterfactual data points the user may define the step size between the minimum and the maximum value for the numerical features. This can be done selectively for every single feature separately via the
step_sizes
parameter. For all of the features that step size is not defined the default value (\(1\)) will be taken – this can be changed via thedefault_numerical_step_size
parameter.- Parameters
- modelobject, optional (default=None)
A predictive model object that has a
predict
method.- predictive_functionCallable, optional (default=None)
A function that takes in a 2-dimensional data array and returns class predictions. (Alternative to the
model
parameter.)- datasetnumpy.ndarray, optional (default=None)
A 2-dimensional data array representing a dataset used for the problem modeling. It is advised to use the same dataset as for the training of the
model
object.- categorical_indicesList[column indices], optional (default=None)
A list of column indices indicating which columns are categorical.
- numerical_indicesList[column indices], optional (default=None)
A list of column indices indicating which columns are numerical.
- counterfactual_feature_indicesList[column indices], optional (default=None)
A list of column indices indicating which features should be used to compose counterfactual examples. If None` all of the features will be used to generate counterfactuals.
- max_counterfactual_lengthinteger, optional (default=2)
The maximum length of counterfactuals – the number of features altered to compose a counterfactual instance. By default it is set to 2. If set to 0, all available features will be used.
- feature_rangesDictionary[column indices, ranges], optional (default=None)
A dictionary with keys representing the column (feature) indices and the values representing feature ranges. Numerical feature ranges are represented as a pair of numbers
(min, max)
where the first number is the lower bound of the range and the second number is the upper bound of the range. For categorical features this should be a list of all the values that to be tested for this feature. If set toNone
, adataset
has to be provided to calculate these ranges.- distance_functionsDictionary[column indices, Callable], optional (default=None)
A dictionary with keys representing the column (feature) indices and the values representing Python functions – a Callable that takes two arguments – that will be used to calculate the distance for this particular feature.
- step_sizesDictionary[column indices, Number], optional (default=None)
A dictionary with keys representing the column (feature) indices and the values representing step size for the grid search of this feature. It is only required for the numerical features.
- default_numerical_step_sizeNumber, optional (default=1)
The default step size used with the grid search of numerical features when generating counterfactuals.
- Attributes
- predictCallable
A function used to predict the class of counterfactuals.
- all_indicesSet[column indices]
A set of all the column (feature) indices in the data set from which counterfactuals are generated.
- categorical_indicesSet[column indices]
A set of categorical columns (feature) indices in the data set.
- numerical_indicesSet[column indices]
A set of numerical columns (feature) indices in the data set.
- cf_feature_indicesSet[column indices]
A set of column (feature) indices that will be used to generate counterfactuals – only alterations of these features will be searched to generate counterfactuals.
- feature_rangesDictionary[column indices, ranges]
A dictionary with ranges for all of the
cf_feature_indices
.- max_counterfactual_lengthNumber
The maximum length – the number of features altered – of a counterfactual instance.
- distance_functionsDictionary[column indices, Callable]
A dictionary with distance functions for all of the
cf_feature_indices
.- step_sizesDictionary[column indices, Numbers]
A dictionary with step sizes for all of the numerical features in the
cf_feature_indices
.
- Raises
- AttributeError
The
predictive_function
parameter, if given, does not require 2 non-optional parameters. One of the distance functions provided via thedistance_functions
parameter does not require 2 non-optional parameters.- IncorrectShapeError
The
dataset
array is not 2-dimensional.- IndexError
Some of the
categorical_indices
ornumerical_indices
are not valid for the inputdataset
, when the latter is given. When bothcategorical_indices
andnumerical_indices
parameters are given alongside adataset
they do not cover all of thedataset
arrays’ column indices. The union of categorical and numerical indices does not form a series of consecutive integers when thedataset
array is a classic numpy array. Some of thecounterfactual_feature_indices
are not valid. Some of the indices (dictionary keys) in thefeature_ranges
,distance_functions
orstep_sizes
parameters are not valid (consistent with the provided column indices).- RuntimeError
The
model
object, if provided, lacks apredict
method. Neithermodel
norpredictive_function
was specified – one of these is required.- TypeError
The
predictive_function
parameter is not Python callable, i.e. a Python function. Thecategorical_indices
parameter is neither a list norNone
. Thenumerical_indices
parameter is neither a list norNone
. Some of the indices given in these two lists do not share a common type – only all strings or all integers are allowed. Thecounterfactual_feature_indices
parameter is neither a list norNone
. Themax_counterfactual_length
parameter is not an integer. Thefeature_ranges
parameter is neither a dictionary nor norNone
. A feature range is not a list for a categorical feature or a feature range is not a tuple for a numerical feature. One of the numerical range tuple elements is not a number or all of the elements of a categorical feature range do not share the same type. Thedistance_functions
parameter is not a dictionary. One of the distance functions defined via thedistance_functions
parameter is not a Python callable. Thestep_sizes
parameter is not dictionary. One of the step sizes defined via thestep_sizes
parameter is not a number. Thedefault_numerical_step_size
parameter is not a number.- ValueError
Some of the categorical (textual) features in the
dataset
array (when given) are not indicated by the user – given via thecategorical_indices
parameter – to be categorical (it is not possible to treat textual fields as numerical features. Some of the categorical features in thedataset
array are selected to be numerical via thenumerical_indices
parameter. Some of the feature ranges are missing and need to be computed from adataset
but none is given.categorical_indices
andnumerical_indices
parameters were not provided in the absence of adataset
. Thedataset
array is not of a base type (strings and/or numbers). Bothcategorical_indices
andnumerical_indices
parameters are empty lists. Both of these lists share some common indices. Thecounterfactual_feature_indices
parameter is an empty list. Themax_counterfactual_length
parameter is not a non-negative integer. Thefeature_ranges
parameter is an empty dictionary. One of the categorical ranges provided is an empty list. One of the numerical ranges is a tuple of length different than 2 or the second element of the range tuple is not strictly larger than the first one. Thedistance_functions
parameter is an empty dictionary. Thestep_sizes
parameter is an empty dictionary. Some of the step sizes specified via thestep_sizes
dictionary are not strictly positive numbers. Thedefault_numerical_step_size
parameter is not a strictly positive number. When discovering feature ranges from thedataset
there is only one value for a numerical feature meaning that a range cannot be created.
- Warns
- UserWarning
The value of the
max_counterfactual_length parameter
is larger than the number of features. A step size (via thestep_sizes
parameter) is provided for one of the categorical features. Both amodel
and apredictive_function
parameters are supplied. When discovering categorical feature ranges from thedataset
there is only one unique value for any particular feature.
Methods
explain_instance
(instance, numpy.void], …)Finds counterfactual data points, their class and distance.
-
explain_instance
(instance: Union[numpy.ndarray, numpy.void], counterfactual_class: Union[int, str, None] = None, normalise_distance: bool = False) → Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray][source]¶ Finds counterfactual data points, their class and distance.
Returns a numpy array with counterfactual instances, their predicted classes and distances for a given data
instance
. The counterfactual class can be selected by the user, otherwise all possible classes are considered.- Parameters
- instancenumpy.ndarray or numpy.void
A 1-dimensional numpy array representing a data point for which counterfactuals are desired.
- counterfactual_classstring or integer, optional (default=None)
A class of counterfactual instances. If
None
counterfactuals of all classes other than the predicted class of the inputinstance
will be returned.- normalise_distanceboolean, optional (default=False)
Whether to normalise the distance, cf. the
_get_distance
method for more details.
- Returns
- counterfactualsnumpy.ndarray
A 2-dimensional numpy array with counterfactual data points.
- counterfactuals_distancesnumpy.ndarray
A 1-dimensional numpy array with distances from the input
instance
to every counterfactual data point.- counterfactuals_predictionsnumpy.ndarray
A 1-dimensional numpy array with predictions for every counterfactual data point.
- Raises
- IncorrectShapeError
The input
instance
is not a 1-dimensional numpy array.- IndexError
The indices that were used to initialise this class are not valid for the given input
instance
.- TypeError
The
counterfactual_class
parameter is neither string not integer. Thenormalise_distance
parameter is not a boolean.- ValueError
The input
instance
is not of a base type (string and/or integer).
- Warns
- UserWarning
When generating counterfactuals the value of one of the features for the specified input
instance
is outside of the specified range for this feature.