fatf.utils.data.augmentation
.DecisionBoundarySphere¶
-
class
fatf.utils.data.augmentation.
DecisionBoundarySphere
(dataset: numpy.ndarray, predictive_function: Callable[[numpy.ndarray], numpy.ndarray], categorical_indices: Optional[List[Union[str, int]]] = None, int_to_float: bool = True, radius_init: float = 0.01, radius_increment: float = 0.01)[source]¶ Sampling data in a hyper-sphere around the closest decision boundary.
New in version 0.0.2.
DecisionBoundarySphere
implements an adapted version of the local surrogate sampling introduced by [LAUGEL2018DEFINING]. A hyper-sphere is grown around the specified data point until a decision boundary is found, then from a point on this decision boundary data points are sampled uniformly in an l-2 hyper-sphere with a user-predefined radius.Note
Categorical features.
This augmenter does not currently support data sets with categorical features.
For additional parameters, attributes, warnings and exceptions raised by this class please see the documentation of its parent class:
fatf.utils.data.augmentation.Augmentation
.- LAUGEL2018DEFINING
Laugel, T., Renard, X., Lesot, M. J., Marsala, C., & Detyniecki, M. (2018). Defining locality for surrogates in post-hoc interpretablity. Workshop on Human Interpretability for Machine Learning (WHI) – International Conference on Machine Learning, 2018.
- Parameters
- predictive_functionCallable[[numpy.ndarray], numpy.ndarray]
A Python callable, e.g., a function, that is either a classifier or a probabilistic predictor. This function is used to compute the class of the sampled data, which is used to identify a decision boundary. A probabilistic function is expected to output a 2-dimensional numpy array with the assigned class being the one with maximum probability. A classifier function is expected to output a 1-dimensional numpy array with class assignment. The
predictive_function
should require exactly one input parameter – a data array to be predicted.- radius_initfloat, optional (default=0.01)
The initial radius of the specified data point around which a hyper-sphere will be placed to discover a decision boundary.
- radius_incrementfloat, optional (default=0.01)
The additive increment to the initial hyper-sphere radius by which it will be incremented (in every iteration of the sampling procedure) if no decision boundary has been discovered.
- Attributes
- predictive_functionCallable[[numpy.ndarray], numpy.ndarray]
The predictive function used to initialise this class.
- is_probabilisticboolean
True
if thepredictive_function
is probabilistic,False
otherwise. This is set based on the shape of the numpy array output by thepredictive_function
: if it is a 2-dimensional array, thepredictive_function
is assumed to be probabilistic, if it is a 1-dimensional array, thepredictive_function
is assumed to be a classifier.- radius_initfloat
The initial radius of a hyper-sphere placed around the specified data point within which new data points will be sampled to discover a decision boundary.
- radius_incrementfloat
The additive increment to the initial hyper-sphere radius by which it will be incremented (in every iteration of the sampling procedure) if no decision boundary has been discovered.
- Raises
- IncompatibleModelError
The
predictive_function
does not require exactly one input parameter.- NotImplementedError
Some of the features in the data set are categorical – this feature type is not supported at present.
- TypeError
The
predictive_function
parameter is not a Python callable. Either theradius_init
orradius_increment
parameter is not a number.- ValueError
Either
radius_init
orradius_increment
parameter is less or equal to 0.
Methods
sample
(data_row, numpy.void], sphere_radius, …)Samples data around the closest decision boundary to the
data_row
.-
sample
(data_row: Union[numpy.ndarray, numpy.void], sphere_radius: float = 0.05, samples_number: int = 50, discover_samples_number: int = 100, max_iter: int = 1000) → numpy.ndarray[source]¶ Samples data around the closest decision boundary to the
data_row
.For the additional documentation of the input parameters, warnings and errors please see the description of the
fatf.utils.data.augmentation.Augmentation.sample
method in the parentfatf.utils.data.augmentation.Augmentation
class.- Parameters
- sphere_radiusfloat, optional (default=0.05)
Radius of the hyper-sphere around the closest decision boundary to
data_row
within which new data points will be sampled.- discover_samples_numberinteger, optional (default=100)
Number of samples generated at each iteration of the sampling procedure that are used to discover the nearest decision boundary around the
data_row
.- max_iterinteger, optional (default=1000)
The maximum number of iterations for the iterative hyper-sphere growing (around the
data_row
) procedure. If the limit is reached and a decision boundary has not been found aRuntimeError
is raised. If this is the case you may want to consider initialising the class with a largerradius_init
orradius_increment
parameter. Alternatively, increasing thediscover_samples_number
ormax_iter
parameter may help to discover the nearest boundary with all the other parameters fixed.
- Returns
- samplesnumpy.ndarray
A numpy array of shape [
samples_number
, number of features] that holds the sampled data.
- Raises
- NotImplementedError
The
data_row
isNone
– sampling from the mean of thedataset
used to initialise this class is not yet implemented.- RuntimeError
The maximum number of iterations was reached without the algorithm discovering a decision boundary.
- TypeError
The
sphere_radius
parameter is not a number. Thediscover_samples_number
ormax_iter
parameter is not an integer.- ValueError
The
sphere_radius
,discover_samples_number
ormax_iter
parameter is not a positive number (greater than 0).