fatf.utils.data.augmentation.DecisionBoundarySphere¶
- 
class fatf.utils.data.augmentation.DecisionBoundarySphere(dataset: numpy.ndarray, predictive_function: Callable[[numpy.ndarray], numpy.ndarray], categorical_indices: Optional[List[Union[str, int]]] = None, int_to_float: bool = True, radius_init: float = 0.01, radius_increment: float = 0.01)[source]¶
- Sampling data in a hyper-sphere around the closest decision boundary. - New in version 0.0.2. - DecisionBoundarySphereimplements an adapted version of the local surrogate sampling introduced by [LAUGEL2018DEFINING]. A hyper-sphere is grown around the specified data point until a decision boundary is found, then from a point on this decision boundary data points are sampled uniformly in an l-2 hyper-sphere with a user-predefined radius.- Note - Categorical features. - This augmenter does not currently support data sets with categorical features. - For additional parameters, attributes, warnings and exceptions raised by this class please see the documentation of its parent class: - fatf.utils.data.augmentation.Augmentation.- LAUGEL2018DEFINING
- Laugel, T., Renard, X., Lesot, M. J., Marsala, C., & Detyniecki, M. (2018). Defining locality for surrogates in post-hoc interpretablity. Workshop on Human Interpretability for Machine Learning (WHI) – International Conference on Machine Learning, 2018. 
 - Parameters
- predictive_functionCallable[[numpy.ndarray], numpy.ndarray]
- A Python callable, e.g., a function, that is either a classifier or a probabilistic predictor. This function is used to compute the class of the sampled data, which is used to identify a decision boundary. A probabilistic function is expected to output a 2-dimensional numpy array with the assigned class being the one with maximum probability. A classifier function is expected to output a 1-dimensional numpy array with class assignment. The - predictive_functionshould require exactly one input parameter – a data array to be predicted.
- radius_initfloat, optional (default=0.01)
- The initial radius of the specified data point around which a hyper-sphere will be placed to discover a decision boundary. 
- radius_incrementfloat, optional (default=0.01)
- The additive increment to the initial hyper-sphere radius by which it will be incremented (in every iteration of the sampling procedure) if no decision boundary has been discovered. 
 
- Attributes
- predictive_functionCallable[[numpy.ndarray], numpy.ndarray]
- The predictive function used to initialise this class. 
- is_probabilisticboolean
- Trueif the- predictive_functionis probabilistic,- Falseotherwise. This is set based on the shape of the numpy array output by the- predictive_function: if it is a 2-dimensional array, the- predictive_functionis assumed to be probabilistic, if it is a 1-dimensional array, the- predictive_functionis assumed to be a classifier.
- radius_initfloat
- The initial radius of a hyper-sphere placed around the specified data point within which new data points will be sampled to discover a decision boundary. 
- radius_incrementfloat
- The additive increment to the initial hyper-sphere radius by which it will be incremented (in every iteration of the sampling procedure) if no decision boundary has been discovered. 
 
- Raises
- IncompatibleModelError
- The - predictive_functiondoes not require exactly one input parameter.
- NotImplementedError
- Some of the features in the data set are categorical – this feature type is not supported at present. 
- TypeError
- The - predictive_functionparameter is not a Python callable. Either the- radius_initor- radius_incrementparameter is not a number.
- ValueError
- Either - radius_initor- radius_incrementparameter is less or equal to 0.
 
 - Methods - sample(data_row, numpy.void], sphere_radius, …)- Samples data around the closest decision boundary to the - data_row.- 
sample(data_row: Union[numpy.ndarray, numpy.void], sphere_radius: float = 0.05, samples_number: int = 50, discover_samples_number: int = 100, max_iter: int = 1000) → numpy.ndarray[source]¶
- Samples data around the closest decision boundary to the - data_row.- For the additional documentation of the input parameters, warnings and errors please see the description of the - fatf.utils.data.augmentation.Augmentation.samplemethod in the parent- fatf.utils.data.augmentation.Augmentationclass.- Parameters
- sphere_radiusfloat, optional (default=0.05)
- Radius of the hyper-sphere around the closest decision boundary to - data_rowwithin which new data points will be sampled.
- discover_samples_numberinteger, optional (default=100)
- Number of samples generated at each iteration of the sampling procedure that are used to discover the nearest decision boundary around the - data_row.
- max_iterinteger, optional (default=1000)
- The maximum number of iterations for the iterative hyper-sphere growing (around the - data_row) procedure. If the limit is reached and a decision boundary has not been found a- RuntimeErroris raised. If this is the case you may want to consider initialising the class with a larger- radius_initor- radius_incrementparameter. Alternatively, increasing the- discover_samples_numberor- max_iterparameter may help to discover the nearest boundary with all the other parameters fixed.
 
- Returns
- samplesnumpy.ndarray
- A numpy array of shape [ - samples_number, number of features] that holds the sampled data.
 
- Raises
- NotImplementedError
- The - data_rowis- None– sampling from the mean of the- datasetused to initialise this class is not yet implemented.
- RuntimeError
- The maximum number of iterations was reached without the algorithm discovering a decision boundary. 
- TypeError
- The - sphere_radiusparameter is not a number. The- discover_samples_numberor- max_iterparameter is not an integer.
- ValueError
- The - sphere_radius,- discover_samples_numberor- max_iterparameter is not a positive number (greater than 0).