fatf.utils.data.augmentation
.DecisionBoundarySphere¶

class
fatf.utils.data.augmentation.
DecisionBoundarySphere
(dataset: numpy.ndarray, predictive_function: Callable[numpy.ndarray, numpy.ndarray], categorical_indices: Optional[List[Union[str, int]]] = None, int_to_float: bool = True, radius_init: float = 0.01, radius_increment: float = 0.01)[source]¶ Sampling data in a hypersphere around the closest decision boundary.
New in version 0.0.2.
DecisionBoundarySphere
implements an adapted version of the local surrogate sampling introduced by [LAUGEL2018DEFINING]. A hypersphere is grown around the specified data point until a decision boundary is found, then from a point on this decision boundary data points are sampled uniformly in an l2 hypersphere with a userpredefined radius.Note
Categorical features.
This augmenter does not currently support data sets with categorical features.
For additional parameters, attributes, warnings and exceptions raised by this class please see the documentation of its parent class:
fatf.utils.data.augmentation.Augmentation
. LAUGEL2018DEFINING
Laugel, T., Renard, X., Lesot, M. J., Marsala, C., & Detyniecki, M. (2018). Defining locality for surrogates in posthoc interpretablity. Workshop on Human Interpretability for Machine Learning (WHI) – International Conference on Machine Learning, 2018.
 Parameters
 predictive_functionCallable[[numpy.ndarray], numpy.ndarray]
A Python callable, e.g., a function, that is either a classifier or a probabilistic predictor. This function is used to compute the class of the sampled data, which is used to identify a decision boundary. A probabilistic function is expected to output a 2dimensional numpy array with the assigned class being the one with maximum probability. A classifier function is expected to output a 1dimensional numpy array with class assignment. The
predictive_function
should require exactly one input parameter – a data array to be predicted. radius_initfloat, optional (default=0.01)
The initial radius of the specified data point around which a hypersphere will be placed to discover a decision boundary.
 radius_incrementfloat, optional (default=0.01)
The additive increment to the initial hypersphere radius by which it will be incremented (in every iteration of the sampling procedure) if no decision boundary has been discovered.
 Attributes
 predictive_functionCallable[[numpy.ndarray], numpy.ndarray]
The predictive function used to initialise this class.
 is_probabilisticboolean
True
if thepredictive_function
is probabilistic,False
otherwise. This is set based on the shape of the numpy array output by thepredictive_function
: if it is a 2dimensional array, thepredictive_function
is assumed to be probabilistic, if it is a 1dimensional array, thepredictive_function
is assumed to be a classifier. radius_initfloat
The initial radius of a hypersphere placed around the specified data point within which new data points will be sampled to discover a decision boundary.
 radius_incrementfloat
The additive increment to the initial hypersphere radius by which it will be incremented (in every iteration of the sampling procedure) if no decision boundary has been discovered.
 Raises
 IncompatibleModelError
The
predictive_function
does not require exactly one input parameter. NotImplementedError
Some of the features in the data set are categorical – this feature type is not supported at present.
 TypeError
The
predictive_function
parameter is not a Python callable. Either theradius_init
orradius_increment
parameter is not a number. ValueError
Either
radius_init
orradius_increment
parameter is less or equal to 0.
Methods
sample
(data_row, numpy.void], sphere_radius, …)Samples data around the closest decision boundary to the
data_row
.
sample
(data_row: Union[numpy.ndarray, numpy.void], sphere_radius: float = 0.05, samples_number: int = 50, discover_samples_number: int = 100, max_iter: int = 1000) → numpy.ndarray[source]¶ Samples data around the closest decision boundary to the
data_row
.For the additional documentation of the input parameters, warnings and errors please see the description of the
fatf.utils.data.augmentation.Augmentation.sample
method in the parentfatf.utils.data.augmentation.Augmentation
class. Parameters
 sphere_radiusfloat, optional (default=0.05)
Radius of the hypersphere around the closest decision boundary to
data_row
within which new data points will be sampled. discover_samples_numberinteger, optional (default=100)
Number of samples generated at each iteration of the sampling procedure that are used to discover the nearest decision boundary around the
data_row
. max_iterinteger, optional (default=1000)
The maximum number of iterations for the iterative hypersphere growing (around the
data_row
) procedure. If the limit is reached and a decision boundary has not been found aRuntimeError
is raised. If this is the case you may want to consider initialising the class with a largerradius_init
orradius_increment
parameter. Alternatively, increasing thediscover_samples_number
ormax_iter
parameter may help to discover the nearest boundary with all the other parameters fixed.
 Returns
 samplesnumpy.ndarray
A numpy array of shape [
samples_number
, number of features] that holds the sampled data.
 Raises
 NotImplementedError
The
data_row
isNone
– sampling from the mean of thedataset
used to initialise this class is not yet implemented. RuntimeError
The maximum number of iterations was reached without the algorithm discovering a decision boundary.
 TypeError
The
sphere_radius
parameter is not a number. Thediscover_samples_number
ormax_iter
parameter is not an integer. ValueError
The
sphere_radius
,discover_samples_number
ormax_iter
parameter is not a positive number (greater than 0).