fatf.transparency.models.submodular_pick.submodular_pick

fatf.transparency.models.submodular_pick.submodular_pick(dataset, explain_instance, sample_size=0, explanations_number=5)[source]

Applies submodular pick to explanations of a given subset of data.

New in version 0.1.1.

Chooses the most informative data point explanations using the submodular pick algorithm introduced by [RIBEIRO2016WHY]. Submodular pick applies a greedy optimisation to maximise the coverage function for explanations of a subset of points taken from the input data set. The explanation function (explain_instance) must have exactly one required parameter and return a dictionary mapping feature names or indices to their respective importance.

Parameters
datasetnumpy.ndarray

A data set from which to select individual instances to be explained.

explain_instancecallable

A reference to a function or method that can generate an explanation from an array representing an individual instance. This callable must accept exactly one required parameter and return an explanation around the selected data point – a dictionary mapping feature names or indices to their importance.

sample_sizeinteger, optional (default=0)

The number of (randomly selected) data points for which to generate explanations. If 0, explanations for all the data points in the dataset will be generated.

explanations_numberinteger, optional (default=5)

The number of explanations to return. If 0, an ordered list of all explanations generated for the selected data subset are returned.

Returns
sp_explanationsList[Dictionary[Union[integer, string], Number]]

List of explanations chosen by the submodular pick algorithm.

sp_indicesList[integer]

List of indices for rows in the dataset chosen (and explained) by the submodular pick algorithm.

Raises
IncorrectShapeError

The input data set is not a 2-dimensional numpy array.

TypeError

sample_size or explanations_number is not an integer. explain_instance is not Python callable (function or method).

ValueError

The input data set must only contain base types (strings and numbers). sample_size or explanations_number is a negative integer. The number of requested explanations is larger than the number of samples in the data set.

RuntimeError

The explain_instance callable does not require exactly one parameter.

Warns
UserWarning

sample_size is larger than the number of instances (rows) available in the dataset, in which case the entire data set is used. The number of the requested explanations is larger than the number of instances selected to generate explanations – explanations for all the data points in the sample will be generated.

References

RIBEIRO2016WHY

Ribeiro, M.T., Singh, S. and Guestrin, C., 2016, August. Why should I trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135-1144). ACM.