fatf.utils.data.augmentation
.Mixup¶
-
class
fatf.utils.data.augmentation.
Mixup
(dataset: numpy.ndarray, ground_truth: Optional[numpy.ndarray] = None, categorical_indices: Optional[numpy.ndarray] = None, beta_parameters: Optional[Tuple[float, float]] = None, int_to_float: bool = True)[source]¶ Sampling data with the Mixup method.
This object implements the Mixup method introduced by [ZHANG2018MIXUP]. For a specific data point it select points at random from the
dataset
(making sure that the sample is stratified when theground_truth
parameter is given), then it draws samples from a Beta distribution and it forms new data points (samples) according to the convex combination of the original data pint and the randomly sampled dataset points.Note
Sampling from the
dataset
mean is not yet implemented.For additional parameters, attributes, warnings and exceptions raised by this class please see the documentation of its parent class:
fatf.utils.data.augmentation.Augmentation
and the function that validates the input parametersfatf.utils.data.augmentation._validate_input_mixup
.- ZHANG2018MIXUP
Zhang, H., Cisse, M., Dauphin, Y. N. and Lopez-Paz, D., 2018. mixup: Beyond Empirical Risk Minimization. International Conference on Learning Representations (ICLR 2018).
- Parameters
- beta_parametersTuple[number, number]], optional (default=None)
A pair of numerical parameters used with the Beta distribution. If
None
, the beta parameters will be set to(2, 5)
.
- Attributes
- thresholdnumber
A threshold used for mixing the random sample from the
dataset
with the instance used to generate a sample. The threshold value is 0.5.- beta_parametersTuple[number, number]
A pair of numbers used with the Beta distribution sampling.
- ground_truth_uniquenp.ndarray
A sorted array holding all the unique values of the ground truth.
- ground_truth_frequenciesnp.ndarray
An array holding frequencies of all the unique values in the ground truth array. The order of the frequencies correspond with the order of the unique values. The frequencies are normalised and they sum up to 1.
- indices_per_labelList[np.ndarray]
A list of arrays holding (
dataset
) row indices corresponding to each of the unique ground truth values. The order of this list corresponds with the order of the unique values.- ground_truth_probabilitiesnp.ndarray
A numpy array of [number of dataset instances, number of unique ground truth values] shape that holds one-hot encoding (pseudo-probabilities) of the ground truth labels. The column ordering of this array corresponds with the order of the unique values.
- Raises
- TypeError
The
beta_parameters
parameter is neitherNone
nor a tuple. One of the values in thebeta_parameters
tuple is not a number.- ValueError
The
beta_parameters
tuple is not a pair (2-tuple). One of the numbers in thebeta_parameters
tuple is not positive.
Methods
sample
(data_row, numpy.void, None] = None, …)Samples new data around the provided
data_row
using Mixup method.-
sample
(data_row: Union[numpy.ndarray, numpy.void, None] = None, data_row_target: Union[str, float, None] = None, samples_number: int = 50, with_replacement: bool = True, return_probabilities: bool = False) → Tuple[numpy.ndarray, ...][source]¶ Samples new data around the provided
data_row
using Mixup method.If
data_row_target
isNone
, only sampled data will be returned. Otherwise, ifdata_row_target
is provided, theMixup
class will also attempt to sample labels. In this case the labels can either be an array of class probabilities when thereturn_probabilities
parameter is set toTrue
, or an array with a single label per instance selected based on the highest probability when thereturn_probabilities
parameter is set toFalse
.Note
Sampling from the
dataset
mean is not yet implemented.For the documentation of extra parameters, warnings and errors please see the description of the
sample
method in the parentfatf.utils.data.augmentation.Augmentation
class.- Parameters
- data_row_targetUnion[number, string], optional (default=None)
A label (class) of the provided
data_row
. IfNone
the function will only return sampled data, otherwise it will also return targets for the sampled data.- with_replacementboolean, optional (default=True)
If
True
data points are sampled with replacements from the originaldataset
.- return_probabilitiesboolean, optional (default=False)
If
True
the target (class) samples for the sampled data points are in form of a class probability matrix, otherwise they are a flat array with the target labels.
- Returns
- samplesnumpy.ndarray
A numpy array of shape [
samples_number
, number of features] that holds the sampled data.- samples_targetnumpy.ndarray, optional (returned when the
data_row_target
parameter is notNone
) Either a numpy array of shape [samples_number, number of unique labels (classes)] holding the class probabilities for the sampled data or a 1-dimensional numpy array with labels for the sampled data.
- Raises
- NotImplementedError
Raised when the user is trying to sample around the mean of the
dataset
– this functionality is not yet implemented.- TypeError
The
return_probabilities
orwith_replacement
parameters are not booleans. Thedata_row_target
parameter is neither a number not a string.- ValueError
The
data_row_target
parameter has a value that does not appear in the ground truth vector used to initialise this class.
- Warns
- UserWarning
The user is warned when the
data_row_target
parameter is given but theMixup
class was initialised without the ground truth for thedataset
, therefore sampling target values is not possible and thedata_row_target
parameter will be ignored. The user is also warned that the random row indices will not be stratified according to the ground truth distribution if ground truth vector was not given when this class was initialised.