fatf.utils.data.augmentation
.TruncatedNormalSampling¶
-
class
fatf.utils.data.augmentation.
TruncatedNormalSampling
(dataset: numpy.ndarray, categorical_indices: Optional[List[Union[str, int]]] = None, int_to_float: bool = True)[source]¶ Sampling data from a truncated normal distribution.
New in version 0.0.2.
This class allows to sample data according to the truncated normal distribution. The sampling can be performed either around a particular data point (by supplying the
data_row
parameter to thesample
method) or around the mean of the wholedataset
(ifdata_row
is not given when calling thesample
method). In both cases, the standard deviation of each numerical feature calculated for the wholedataset
is used. The minimum and maximum of each numerical feature are also used as the bounds for the truncated normal distribution. For categorical features, the values are sampled with replacement with the probability for each unique value calculated based on the frequency of their appearance in the dataset.For additional parameters, attributes, warnings and exceptions raised by this class please see the documentation of its parent class:
fatf.utils.data.augmentation.Augmentation
.- Attributes
- numerical_sampling_valuesDictionary[column index, Tuple[number, number, number, number]]
Dictionary mapping numerical column feature indices to tuples of four numbers: column’s mean, standard deviation, its minimum and maximum value.
- categorical_sampling_valuesDictionary[column index, Tuple[numpy.ndarray, numpy.ndarray]]
Dictionary mapping categorical column feature indices to tuples consisting of two 1-dimensional numpy arrays: one with unique values for that column and the other one with their normalised (summing up to 1) frequencies.
Methods
sample
(data_row, numpy.void, None] = None, …)Samples new data from a truncated normal distribution.
-
sample
(data_row: Union[numpy.ndarray, numpy.void, None] = None, samples_number: int = 50) → numpy.ndarray[source]¶ Samples new data from a truncated normal distribution.
If
data_row
parameter is given, the sample will be centered around that data point. Otherwise, when thedata_row
parameter isNone
, the sample will be generated around the mean of the dataset used to initialise this class.Numerical features are sampled around their corresponding values in the
data_row
parameter or the mean of that feature in the dataset using the standard deviation, minimum and maximum values calculated from the dataset. Categorical features are sampled by choosing with replacement all the possible values of that feature with the probability of sampling each value corresponding to this value’s frequency in the dataset. (This means that any particular value of a categorical feature in adata_row
is ignored.)For the documentation of parameters, warnings and errors please see the description of the
fatf.utils.data.augmentation.Augmentation.sample
method in the parentfatf.utils.data.augmentation.Augmentation
class.