fatf.utils.data.augmentation.DecisionBoundarySphere

class fatf.utils.data.augmentation.DecisionBoundarySphere(dataset: numpy.ndarray, predictive_function: Callable[[numpy.ndarray], numpy.ndarray], categorical_indices: Optional[List[Union[str, int]]] = None, int_to_float: bool = True, radius_init: float = 0.01, radius_increment: float = 0.01)[source]

Sampling data in a hyper-sphere around the closest decision boundary.

New in version 0.0.2.

DecisionBoundarySphere implements an adapted version of the local surrogate sampling introduced by [LAUGEL2018DEFINING]. A hyper-sphere is grown around the specified data point until a decision boundary is found, then from a point on this decision boundary data points are sampled uniformly in an l-2 hyper-sphere with a user-predefined radius.

Note

Categorical features.

This augmenter does not currently support data sets with categorical features.

For additional parameters, attributes, warnings and exceptions raised by this class please see the documentation of its parent class: fatf.utils.data.augmentation.Augmentation.

LAUGEL2018DEFINING

Laugel, T., Renard, X., Lesot, M. J., Marsala, C., & Detyniecki, M. (2018). Defining locality for surrogates in post-hoc interpretablity. Workshop on Human Interpretability for Machine Learning (WHI) – International Conference on Machine Learning, 2018.

Parameters
predictive_functionCallable[[numpy.ndarray], numpy.ndarray]

A Python callable, e.g., a function, that is either a classifier or a probabilistic predictor. This function is used to compute the class of the sampled data, which is used to identify a decision boundary. A probabilistic function is expected to output a 2-dimensional numpy array with the assigned class being the one with maximum probability. A classifier function is expected to output a 1-dimensional numpy array with class assignment. The predictive_function should require exactly one input parameter – a data array to be predicted.

radius_initfloat, optional (default=0.01)

The initial radius of the specified data point around which a hyper-sphere will be placed to discover a decision boundary.

radius_incrementfloat, optional (default=0.01)

The additive increment to the initial hyper-sphere radius by which it will be incremented (in every iteration of the sampling procedure) if no decision boundary has been discovered.

Attributes
predictive_functionCallable[[numpy.ndarray], numpy.ndarray]

The predictive function used to initialise this class.

is_probabilisticboolean

True if the predictive_function is probabilistic, False otherwise. This is set based on the shape of the numpy array output by the predictive_function: if it is a 2-dimensional array, the predictive_function is assumed to be probabilistic, if it is a 1-dimensional array, the predictive_function is assumed to be a classifier.

radius_initfloat

The initial radius of a hyper-sphere placed around the specified data point within which new data points will be sampled to discover a decision boundary.

radius_incrementfloat

The additive increment to the initial hyper-sphere radius by which it will be incremented (in every iteration of the sampling procedure) if no decision boundary has been discovered.

Raises
IncompatibleModelError

The predictive_function does not require exactly one input parameter.

NotImplementedError

Some of the features in the data set are categorical – this feature type is not supported at present.

TypeError

The predictive_function parameter is not a Python callable. Either the radius_init or radius_increment parameter is not a number.

ValueError

Either radius_init or radius_increment parameter is less or equal to 0.

Methods

sample(data_row, numpy.void], sphere_radius, …)

Samples data around the closest decision boundary to the data_row.

sample(data_row: Union[numpy.ndarray, numpy.void], sphere_radius: float = 0.05, samples_number: int = 50, discover_samples_number: int = 100, max_iter: int = 1000) → numpy.ndarray[source]

Samples data around the closest decision boundary to the data_row.

For the additional documentation of the input parameters, warnings and errors please see the description of the fatf.utils.data.augmentation.Augmentation.sample method in the parent fatf.utils.data.augmentation.Augmentation class.

Parameters
sphere_radiusfloat, optional (default=0.05)

Radius of the hyper-sphere around the closest decision boundary to data_row within which new data points will be sampled.

discover_samples_numberinteger, optional (default=100)

Number of samples generated at each iteration of the sampling procedure that are used to discover the nearest decision boundary around the data_row.

max_iterinteger, optional (default=1000)

The maximum number of iterations for the iterative hyper-sphere growing (around the data_row) procedure. If the limit is reached and a decision boundary has not been found a RuntimeError is raised. If this is the case you may want to consider initialising the class with a larger radius_init or radius_increment parameter. Alternatively, increasing the discover_samples_number or max_iter parameter may help to discover the nearest boundary with all the other parameters fixed.

Returns
samplesnumpy.ndarray

A numpy array of shape [samples_number, number of features] that holds the sampled data.

Raises
NotImplementedError

The data_row is None – sampling from the mean of the dataset used to initialise this class is not yet implemented.

RuntimeError

The maximum number of iterations was reached without the algorithm discovering a decision boundary.

TypeError

The sphere_radius parameter is not a number. The discover_samples_number or max_iter parameter is not an integer.

ValueError

The sphere_radius, discover_samples_number or max_iter parameter is not a positive number (greater than 0).