fatf.utils.data.discretisation
.QuartileDiscretiser¶
-
class
fatf.utils.data.discretisation.
QuartileDiscretiser
(dataset: numpy.ndarray, categorical_indices: Optional[List[Union[str, int]]] = None, feature_names: Optional[List[str]] = None)[source]¶ Discretises selected numerical features of the
dataset
into quartiles.New in version 0.0.2.
This class discretises numerical columns (features) of the
dataset
by mapping their values onto quartile ids to which they belong. The quartile boundaries are computed based of thedataset
used to initialise this class.- Parameters
- datasetnumpy.ndarray
A 2-dimensional numpy array with a dataset to be discretised.
- categorical_indicesList[column indices], optional (default=None)
A list of column indices that should be treat as categorical features. If
None
is given, this will be inferred from thedataset
array: string-based columns will be treated as categorical features and numerical columns will be treated as numerical features.- feature_namesList[strings], optional (default=None)
A list of feature names in order they appear in the
dataset
array. IfNone
, this will be extracted from thedataset
array. For structured arrays these will be the column names extracted from the dtype; for classic arrays these will be numbers indicating the column index in the array.
- Attributes
- dataset_dtypenumpy.dtype
The dtype of the input
dataset
.- is_structuredboolean
True
if the inputdataset
is a structured numpy array,False
otherwise.- features_numberinteger
The number of features (columns) in the input
dataset
.- categorical_indicesList[Column Indices]
A list of column indices that should be treat as categorical features.
- numerical_indicesList[Column Indices]
A list of column indices that should be treat as numerical features.
- feature_names_mapDict[Column Index, String]
A dictionary that holds mapping of column (feature) indices to their names (feature names). If the
feature_names
parameter was not given (None
), the feature names are inferred from thedataset
.- feature_value_namesDictionary[Index, Dictionary[Integer, String]]
A dictionary mapping
dataset
column (feature) indices to dictionaries holding quartile description (value) of each quartile id (key) for that feature.- feature_bin_boundariesDictionary[Index, numpy.ndarray]
A dictionary mapping
dataset
column (feature) indices to numpy arrays holding quartile bin boundaries (with the upper boundary inclusive) for each feature.- discretised_dtypenumpy.dtype
The dtype of the discretised arrays output by the
discrete
method.
- Raises
- IncorrectShapeError
The input
dataset
is not a 2-dimensional numpy array.- IndexError
Some of the column indices given in the
categorical_indices
list are invalid for the inputdataset
.- TypeError
The
dataset
is not of a base (numerical and/or string) type. Thecategorical_indices
is neither a Python list norNone
. Thefeature_names
is neither a Python list norNone
or one of its elements (if it is a list) is not a string.- ValueError
The length of the
feature_names
list is different than the number of columns (features) in the inputdataset
.
- Warns
- UserWarning
If some of the string-based columns in the input data array were not indicated to be categorical features by the user (via the
categorical_indices
parameter) the user is warned that they will be added to the list of categorical features.
Methods
discretise
(dataset, numpy.void])Discretises numerical features of the
dataset
into quartiles.-
discretise
(dataset: Union[numpy.ndarray, numpy.void]) → Union[numpy.ndarray, numpy.void][source]¶ Discretises numerical features of the
dataset
into quartiles.- Parameters
- datasetUnion[numpy.ndarray, numpy.void]
A data point (1-D) or an array (2-D) of data points to be discretised.
- Raises
- ——
- IncorrectShapeError
The input
dataset
is neither 1- nor 2-dimensional numpy array. The number of features (columns) in the inputdataset
is different than the number of features in the dataset used to initialise this object.- TypeError
The dtype of the input
dataset
is too different from the dtype of the dataset used to initialise this object.
- Returns
- discretised_dataUnion[numpy.ndarray, numpy.void]
A discretised data array.