fatf.utils.data.augmentation.NormalSampling

class fatf.utils.data.augmentation.NormalSampling(dataset: numpy.ndarray, categorical_indices: Optional[List[Union[str, int]]] = None, int_to_float: bool = True)[source]

Sampling data from a normal distribution.

This class allows to sample data according to a normal distribution. The sampling can be performed either around a particular data point (by supplying the data_row parameter to the sample method) or around the mean of the whole dataset (if data_row is not given when calling the sample method). In both cases, the standard deviation of each numerical feature calculated for the whole dataset is used. For categorical features, the values are sampled with replacement with the probability for each unique value calculated based on the frequency of its appearance in the dataset.

For additional parameters, attributes, warnings and exceptions raised by this class please see the documentation of its parent class: fatf.utils.data.augmentation.Augmentation.

Attributes
numerical_sampling_valuesDictionary[column index, Tuple[number, number]]

Dictionary mapping numerical column feature indices to tuples of two numbers: column’s mean and its standard deviation.

categorical_sampling_valuesDictionary[column index, Tuple[numpy.ndarray, numpy.ndarray]]

Dictionary mapping categorical column feature indices to tuples consisting of two 1-dimensional numpy arrays: one with unique values for that column and the other one with their normalised (summing up to 1) frequencies.

Methods

sample(data_row, numpy.void, None] = None, …)

Samples new data from a normal distribution.

sample(data_row: Union[numpy.ndarray, numpy.void, None] = None, samples_number: int = 50) → numpy.ndarray[source]

Samples new data from a normal distribution.

If data_row parameter is given, the sample will be centered around that data point. Otherwise, when the data_row parameter is None, the sample will be generated around the mean of the dataset used to initialise this class.

Numerical features are sampled around their corresponding values in the data_row parameter or the mean of that feature in the dataset using the standard deviation calculated from the dataset. Categorical features are sampled by choosing with replacement all the possible values of that feature with the probability of sampling each value corresponding to this value’s frequency in the dataset. (This means that any particular value of a categorical feature in a data_row is ignored.)

For the documentation of parameters, warnings and errors please see the description of the sample method in the parent fatf.utils.data.augmentation.Augmentation class.