fatf.utils.data.augmentation.Mixup

class fatf.utils.data.augmentation.Mixup(dataset: numpy.ndarray, ground_truth: Optional[numpy.ndarray] = None, categorical_indices: Optional[numpy.ndarray] = None, beta_parameters: Optional[Tuple[float, float]] = None, int_to_float: bool = True)[source]

Sampling data with the Mixup method.

This object implements the Mixup method introduced by [ZHANG2018MIXUP]. For a specific data point it select points at random from the dataset (making sure that the sample is stratified when the ground_truth parameter is given), then it draws samples from a Beta distribution and it forms new data points (samples) according to the convex combination of the original data pint and the randomly sampled dataset points.

Note

Sampling from the dataset mean is not yet implemented.

For additional parameters, attributes, warnings and exceptions raised by this class please see the documentation of its parent class: fatf.utils.data.augmentation.Augmentation and the function that validates the input parameters fatf.utils.data.augmentation._validate_input_mixup.

ZHANG2018MIXUP

Zhang, H., Cisse, M., Dauphin, Y. N. and Lopez-Paz, D., 2018. mixup: Beyond Empirical Risk Minimization. International Conference on Learning Representations (ICLR 2018).

Parameters
beta_parametersTuple[number, number]], optional (default=None)

A pair of numerical parameters used with the Beta distribution. If None, the beta parameters will be set to (2, 5).

Attributes
thresholdnumber

A threshold used for mixing the random sample from the dataset with the instance used to generate a sample. The threshold value is 0.5.

beta_parametersTuple[number, number]

A pair of numbers used with the Beta distribution sampling.

ground_truth_uniquenp.ndarray

A sorted array holding all the unique values of the ground truth.

ground_truth_frequenciesnp.ndarray

An array holding frequencies of all the unique values in the ground truth array. The order of the frequencies correspond with the order of the unique values. The frequencies are normalised and they sum up to 1.

indices_per_labelList[np.ndarray]

A list of arrays holding (dataset) row indices corresponding to each of the unique ground truth values. The order of this list corresponds with the order of the unique values.

ground_truth_probabilitiesnp.ndarray

A numpy array of [number of dataset instances, number of unique ground truth values] shape that holds one-hot encoding (pseudo-probabilities) of the ground truth labels. The column ordering of this array corresponds with the order of the unique values.

Raises
TypeError

The beta_parameters parameter is neither None nor a tuple. One of the values in the beta_parameters tuple is not a number.

ValueError

The beta_parameters tuple is not a pair (2-tuple). One of the numbers in the beta_parameters tuple is not positive.

Methods

sample(data_row, numpy.void, None] = None, …)

Samples new data around the provided data_row using Mixup method.

sample(data_row: Union[numpy.ndarray, numpy.void, None] = None, data_row_target: Union[str, float, None] = None, samples_number: int = 50, with_replacement: bool = True, return_probabilities: bool = False) → Tuple[numpy.ndarray, ...][source]

Samples new data around the provided data_row using Mixup method.

If data_row_target is None, only sampled data will be returned. Otherwise, if data_row_target is provided, the Mixup class will also attempt to sample labels. In this case the labels can either be an array of class probabilities when the return_probabilities parameter is set to True, or an array with a single label per instance selected based on the highest probability when the return_probabilities parameter is set to False.

Note

Sampling from the dataset mean is not yet implemented.

For the documentation of extra parameters, warnings and errors please see the description of the sample method in the parent fatf.utils.data.augmentation.Augmentation class.

Parameters
data_row_targetUnion[number, string], optional (default=None)

A label (class) of the provided data_row. If None the function will only return sampled data, otherwise it will also return targets for the sampled data.

with_replacementboolean, optional (default=True)

If True data points are sampled with replacements from the original dataset.

return_probabilitiesboolean, optional (default=False)

If True the target (class) samples for the sampled data points are in form of a class probability matrix, otherwise they are a flat array with the target labels.

Returns
samplesnumpy.ndarray

A numpy array of shape [samples_number, number of features] that holds the sampled data.

samples_targetnumpy.ndarray, optional (returned when the data_row_target parameter is not None)

Either a numpy array of shape [samples_number, number of unique labels (classes)] holding the class probabilities for the sampled data or a 1-dimensional numpy array with labels for the sampled data.

Raises
NotImplementedError

Raised when the user is trying to sample around the mean of the dataset – this functionality is not yet implemented.

TypeError

The return_probabilities or with_replacement parameters are not booleans. The data_row_target parameter is neither a number not a string.

ValueError

The data_row_target parameter has a value that does not appear in the ground truth vector used to initialise this class.

Warns
UserWarning

The user is warned when the data_row_target parameter is given but the Mixup class was initialised without the ground truth for the dataset, therefore sampling target values is not possible and the data_row_target parameter will be ignored. The user is also warned that the random row indices will not be stratified according to the ground truth distribution if ground truth vector was not given when this class was initialised.