fatf.utils.data.augmentation
.Augmentation¶
-
class
fatf.utils.data.augmentation.
Augmentation
(dataset: numpy.ndarray, ground_truth: Optional[numpy.ndarray] = None, categorical_indices: Optional[numpy.ndarray] = None, int_to_float: bool = True)[source]¶ An abstract class for implementing data augmentation methods.
An abstract class that all augmentation classes should inherit from. It contains abstract
__init__
andsample
methods and an input validator –_validate_sample_input
– for thesample
method. The validation of the input parameters to the initialisation method is done via thefatf.utils.data.augmentation._validate_input
function.Note
The
_validate_sample_input
method should be called in all implementations of thesample
method in the children classes to ensure that all the input parameters of this method are valid.- Parameters
- datasetnumpy.ndarray
A 2-dimensional numpy array with a dataset to be used for sampling.
- ground_truthnumpy.ndarray, optional (default=None)
A 1-dimensional numpy array with labels for the supplied dataset.
- categorical_indicesList[column indices], optional (default=None)
A list of column indices that should be treat as categorical features. If
None
is given this will be inferred from the data array: string-based columns will be treated as categorical features and numerical columns will be treated as numerical features.- int_to_floatboolean
If
True
, all of the integer dtype columns in thedataset
will be generalised tonumpy.float64
type. Otherwise, integer type columns will remain integer and floating point type columns will remain floating point.
- Attributes
- datasetnumpy.ndarray
A 2-dimensional numpy array with a dataset to be used for sampling.
- data_points_numberinteger
The number of data points in the
dataset
.- is_structuredboolean
True
if thedataset
is a structured numpy array,False
otherwise.- ground_truthUnion[numpy.ndarray, None]
A 1-dimensional numpy array with labels for the supplied dataset.
- categorical_indicesList[column indices]
A list of column indices that should be treat as categorical features.
- numerical_indicesList[column indices]
A list of column indices that should be treat as numerical features.
- features_numberinteger
The number of features (columns) in the input
dataset
.- sample_dtypeUnion[numpy.dtype, List[Tuple[string, numpy.dtype]]
A dtype with numerical dtypes (in case of a structured data array) generalised to support the assignment of sampled values. For example, if the dtype of a numerical feature is
int
and the sampling generatesfloat
this dtype will generalise the type of that column tofloat
.
- Raises
- IncorrectShapeError
The input
dataset
is not a 2-dimensional numpy array. Theground_truth
array is not a 1-dimensional numpy array. The number of ground truth annotation is different than the number of rows in the data array.- IndexError
Some of the column indices given in the
categorical_indices
parameter are not valid for the inputdataset
.- TypeError
The
categorical_indices
parameter is neither a list norNone
. Thedataset
or theground_truth
array (if notNone
) are not of base (numerical and/or string) type. Theint_to_float
parameter is not a boolean.
- Warns
- UserWarning
If some of the string-based columns in the input data array were not indicated to be categorical features by the user (via the
categorical_indices
parameter) the user is warned that they will be added to the list of categorical features.
Methods
sample
(data_row, numpy.void, None] = None, …)Samples a given number of data points based on the initialisation data.
-
sample
(data_row: Union[numpy.ndarray, numpy.void, None] = None, samples_number: int = 50) → numpy.ndarray[source]¶ Samples a given number of data points based on the initialisation data.
This is an abstract method that must be implemented for each child object. This method should provide two modes of operation:
if
data_row
isNone
, the sample should be from the distribution of the whole dataset that was used to initialise this class; andif
data_row
is a numpy array with a data point, the sample should be from the vicinity of this data point.
- Parameters
- data_rowUnion[numpy.ndarray, numpy.void], optional (default=None)
A data point. If given, the sample will be generated around that point.
- samples_numberinteger, optional (default=50)
The number of samples to be generated.
- Returns
- samplesnumpy.ndarray
Sampled data.
- Raises
- NotImplementedError
This is an abstract method and has not been implemented.