fatf.utils.data.augmentation
.NormalSampling¶
-
class
fatf.utils.data.augmentation.
NormalSampling
(dataset: numpy.ndarray, categorical_indices: Optional[List[Union[str, int]]] = None, int_to_float: bool = True)[source]¶ Sampling data from a normal distribution.
This class allows to sample data according to a normal distribution. The sampling can be performed either around a particular data point (by supplying the
data_row
parameter to thesample
method) or around the mean of the wholedataset
(ifdata_row
is not given when calling thesample
method). In both cases, the standard deviation of each numerical feature calculated for the whole dataset is used. For categorical features, the values are sampled with replacement with the probability for each unique value calculated based on the frequency of its appearance in the dataset.For additional parameters, attributes, warnings and exceptions raised by this class please see the documentation of its parent class:
fatf.utils.data.augmentation.Augmentation
.- Attributes
- numerical_sampling_valuesDictionary[column index, Tuple[number, number]]
Dictionary mapping numerical column feature indices to tuples of two numbers: column’s mean and its standard deviation.
- categorical_sampling_valuesDictionary[column index, Tuple[numpy.ndarray, numpy.ndarray]]
Dictionary mapping categorical column feature indices to tuples consisting of two 1-dimensional numpy arrays: one with unique values for that column and the other one with their normalised (summing up to 1) frequencies.
Methods
sample
(data_row, numpy.void, None] = None, …)Samples new data from a normal distribution.
-
sample
(data_row: Union[numpy.ndarray, numpy.void, None] = None, samples_number: int = 50) → numpy.ndarray[source]¶ Samples new data from a normal distribution.
If
data_row
parameter is given, the sample will be centered around that data point. Otherwise, when thedata_row
parameter isNone
, the sample will be generated around the mean of the dataset used to initialise this class.Numerical features are sampled around their corresponding values in the
data_row
parameter or the mean of that feature in the dataset using the standard deviation calculated from the dataset. Categorical features are sampled by choosing with replacement all the possible values of that feature with the probability of sampling each value corresponding to this value’s frequency in the dataset. (This means that any particular value of a categorical feature in adata_row
is ignored.)For the documentation of parameters, warnings and errors please see the description of the
sample
method in the parentfatf.utils.data.augmentation.Augmentation
class.