fatf.utils.data.discretisation.QuartileDiscretiser¶
-
class
fatf.utils.data.discretisation.QuartileDiscretiser(dataset: numpy.ndarray, categorical_indices: Optional[List[Union[str, int]]] = None, feature_names: Optional[List[str]] = None)[source]¶ Discretises selected numerical features of the
datasetinto quartiles.New in version 0.0.2.
This class discretises numerical columns (features) of the
datasetby mapping their values onto quartile ids to which they belong. The quartile boundaries are computed based of thedatasetused to initialise this class.- Parameters
- datasetnumpy.ndarray
A 2-dimensional numpy array with a dataset to be discretised.
- categorical_indicesList[column indices], optional (default=None)
A list of column indices that should be treat as categorical features. If
Noneis given, this will be inferred from thedatasetarray: string-based columns will be treated as categorical features and numerical columns will be treated as numerical features.- feature_namesList[strings], optional (default=None)
A list of feature names in order they appear in the
datasetarray. IfNone, this will be extracted from thedatasetarray. For structured arrays these will be the column names extracted from the dtype; for classic arrays these will be numbers indicating the column index in the array.
- Attributes
- dataset_dtypenumpy.dtype
The dtype of the input
dataset.- is_structuredboolean
Trueif the inputdatasetis a structured numpy array,Falseotherwise.- features_numberinteger
The number of features (columns) in the input
dataset.- categorical_indicesList[Column Indices]
A list of column indices that should be treat as categorical features.
- numerical_indicesList[Column Indices]
A list of column indices that should be treat as numerical features.
- feature_names_mapDict[Column Index, String]
A dictionary that holds mapping of column (feature) indices to their names (feature names). If the
feature_namesparameter was not given (None), the feature names are inferred from thedataset.- feature_value_namesDictionary[Index, Dictionary[Integer, String]]
A dictionary mapping
datasetcolumn (feature) indices to dictionaries holding quartile description (value) of each quartile id (key) for that feature.- feature_bin_boundariesDictionary[Index, numpy.ndarray]
A dictionary mapping
datasetcolumn (feature) indices to numpy arrays holding quartile bin boundaries (with the upper boundary inclusive) for each feature.- discretised_dtypenumpy.dtype
The dtype of the discretised arrays output by the
discretemethod.
- Raises
- IncorrectShapeError
The input
datasetis not a 2-dimensional numpy array.- IndexError
Some of the column indices given in the
categorical_indiceslist are invalid for the inputdataset.- TypeError
The
datasetis not of a base (numerical and/or string) type. Thecategorical_indicesis neither a Python list norNone. Thefeature_namesis neither a Python list norNoneor one of its elements (if it is a list) is not a string.- ValueError
The length of the
feature_nameslist is different than the number of columns (features) in the inputdataset.
- Warns
- UserWarning
If some of the string-based columns in the input data array were not indicated to be categorical features by the user (via the
categorical_indicesparameter) the user is warned that they will be added to the list of categorical features.
Methods
discretise(dataset, numpy.void])Discretises numerical features of the
datasetinto quartiles.-
discretise(dataset: Union[numpy.ndarray, numpy.void]) → Union[numpy.ndarray, numpy.void][source]¶ Discretises numerical features of the
datasetinto quartiles.- Parameters
- datasetUnion[numpy.ndarray, numpy.void]
A data point (1-D) or an array (2-D) of data points to be discretised.
- Raises
- ——
- IncorrectShapeError
The input
datasetis neither 1- nor 2-dimensional numpy array. The number of features (columns) in the inputdatasetis different than the number of features in the dataset used to initialise this object.- TypeError
The dtype of the input
datasetis too different from the dtype of the dataset used to initialise this object.
- Returns
- discretised_dataUnion[numpy.ndarray, numpy.void]
A discretised data array.