fatf.utils.data.augmentation.DecisionBoundarySphere¶
-
class
fatf.utils.data.augmentation.DecisionBoundarySphere(dataset: numpy.ndarray, predictive_function: Callable[[numpy.ndarray], numpy.ndarray], categorical_indices: Optional[List[Union[str, int]]] = None, int_to_float: bool = True, radius_init: float = 0.01, radius_increment: float = 0.01)[source]¶ Sampling data in a hyper-sphere around the closest decision boundary.
New in version 0.0.2.
DecisionBoundarySphereimplements an adapted version of the local surrogate sampling introduced by [LAUGEL2018DEFINING]. A hyper-sphere is grown around the specified data point until a decision boundary is found, then from a point on this decision boundary data points are sampled uniformly in an l-2 hyper-sphere with a user-predefined radius.Note
Categorical features.
This augmenter does not currently support data sets with categorical features.
For additional parameters, attributes, warnings and exceptions raised by this class please see the documentation of its parent class:
fatf.utils.data.augmentation.Augmentation.- LAUGEL2018DEFINING
Laugel, T., Renard, X., Lesot, M. J., Marsala, C., & Detyniecki, M. (2018). Defining locality for surrogates in post-hoc interpretablity. Workshop on Human Interpretability for Machine Learning (WHI) – International Conference on Machine Learning, 2018.
- Parameters
- predictive_functionCallable[[numpy.ndarray], numpy.ndarray]
A Python callable, e.g., a function, that is either a classifier or a probabilistic predictor. This function is used to compute the class of the sampled data, which is used to identify a decision boundary. A probabilistic function is expected to output a 2-dimensional numpy array with the assigned class being the one with maximum probability. A classifier function is expected to output a 1-dimensional numpy array with class assignment. The
predictive_functionshould require exactly one input parameter – a data array to be predicted.- radius_initfloat, optional (default=0.01)
The initial radius of the specified data point around which a hyper-sphere will be placed to discover a decision boundary.
- radius_incrementfloat, optional (default=0.01)
The additive increment to the initial hyper-sphere radius by which it will be incremented (in every iteration of the sampling procedure) if no decision boundary has been discovered.
- Attributes
- predictive_functionCallable[[numpy.ndarray], numpy.ndarray]
The predictive function used to initialise this class.
- is_probabilisticboolean
Trueif thepredictive_functionis probabilistic,Falseotherwise. This is set based on the shape of the numpy array output by thepredictive_function: if it is a 2-dimensional array, thepredictive_functionis assumed to be probabilistic, if it is a 1-dimensional array, thepredictive_functionis assumed to be a classifier.- radius_initfloat
The initial radius of a hyper-sphere placed around the specified data point within which new data points will be sampled to discover a decision boundary.
- radius_incrementfloat
The additive increment to the initial hyper-sphere radius by which it will be incremented (in every iteration of the sampling procedure) if no decision boundary has been discovered.
- Raises
- IncompatibleModelError
The
predictive_functiondoes not require exactly one input parameter.- NotImplementedError
Some of the features in the data set are categorical – this feature type is not supported at present.
- TypeError
The
predictive_functionparameter is not a Python callable. Either theradius_initorradius_incrementparameter is not a number.- ValueError
Either
radius_initorradius_incrementparameter is less or equal to 0.
Methods
sample(data_row, numpy.void], sphere_radius, …)Samples data around the closest decision boundary to the
data_row.-
sample(data_row: Union[numpy.ndarray, numpy.void], sphere_radius: float = 0.05, samples_number: int = 50, discover_samples_number: int = 100, max_iter: int = 1000) → numpy.ndarray[source]¶ Samples data around the closest decision boundary to the
data_row.For the additional documentation of the input parameters, warnings and errors please see the description of the
fatf.utils.data.augmentation.Augmentation.samplemethod in the parentfatf.utils.data.augmentation.Augmentationclass.- Parameters
- sphere_radiusfloat, optional (default=0.05)
Radius of the hyper-sphere around the closest decision boundary to
data_rowwithin which new data points will be sampled.- discover_samples_numberinteger, optional (default=100)
Number of samples generated at each iteration of the sampling procedure that are used to discover the nearest decision boundary around the
data_row.- max_iterinteger, optional (default=1000)
The maximum number of iterations for the iterative hyper-sphere growing (around the
data_row) procedure. If the limit is reached and a decision boundary has not been found aRuntimeErroris raised. If this is the case you may want to consider initialising the class with a largerradius_initorradius_incrementparameter. Alternatively, increasing thediscover_samples_numberormax_iterparameter may help to discover the nearest boundary with all the other parameters fixed.
- Returns
- samplesnumpy.ndarray
A numpy array of shape [
samples_number, number of features] that holds the sampled data.
- Raises
- NotImplementedError
The
data_rowisNone– sampling from the mean of thedatasetused to initialise this class is not yet implemented.- RuntimeError
The maximum number of iterations was reached without the algorithm discovering a decision boundary.
- TypeError
The
sphere_radiusparameter is not a number. Thediscover_samples_numberormax_iterparameter is not an integer.- ValueError
The
sphere_radius,discover_samples_numberormax_iterparameter is not a positive number (greater than 0).