fatf.utils.data.discretisation.QuartileDiscretiser

class fatf.utils.data.discretisation.QuartileDiscretiser(dataset: numpy.ndarray, categorical_indices: Optional[List[Union[str, int]]] = None, feature_names: Optional[List[str]] = None)[source]

Discretises selected numerical features of the dataset into quartiles.

New in version 0.0.2.

This class discretises numerical columns (features) of the dataset by mapping their values onto quartile ids to which they belong. The quartile boundaries are computed based of the dataset used to initialise this class.

Parameters
datasetnumpy.ndarray

A 2-dimensional numpy array with a dataset to be discretised.

categorical_indicesList[column indices], optional (default=None)

A list of column indices that should be treat as categorical features. If None is given, this will be inferred from the dataset array: string-based columns will be treated as categorical features and numerical columns will be treated as numerical features.

feature_namesList[strings], optional (default=None)

A list of feature names in order they appear in the dataset array. If None, this will be extracted from the dataset array. For structured arrays these will be the column names extracted from the dtype; for classic arrays these will be numbers indicating the column index in the array.

Attributes
dataset_dtypenumpy.dtype

The dtype of the input dataset.

is_structuredboolean

True if the input dataset is a structured numpy array, False otherwise.

features_numberinteger

The number of features (columns) in the input dataset.

categorical_indicesList[Column Indices]

A list of column indices that should be treat as categorical features.

numerical_indicesList[Column Indices]

A list of column indices that should be treat as numerical features.

feature_names_mapDict[Column Index, String]

A dictionary that holds mapping of column (feature) indices to their names (feature names). If the feature_names parameter was not given (None), the feature names are inferred from the dataset.

feature_value_namesDictionary[Index, Dictionary[Integer, String]]

A dictionary mapping dataset column (feature) indices to dictionaries holding quartile description (value) of each quartile id (key) for that feature.

feature_bin_boundariesDictionary[Index, numpy.ndarray]

A dictionary mapping dataset column (feature) indices to numpy arrays holding quartile bin boundaries (with the upper boundary inclusive) for each feature.

discretised_dtypenumpy.dtype

The dtype of the discretised arrays output by the discrete method.

Raises
IncorrectShapeError

The input dataset is not a 2-dimensional numpy array.

IndexError

Some of the column indices given in the categorical_indices list are invalid for the input dataset.

TypeError

The dataset is not of a base (numerical and/or string) type. The categorical_indices is neither a Python list nor None. The feature_names is neither a Python list nor None or one of its elements (if it is a list) is not a string.

ValueError

The length of the feature_names list is different than the number of columns (features) in the input dataset.

Warns
UserWarning

If some of the string-based columns in the input data array were not indicated to be categorical features by the user (via the categorical_indices parameter) the user is warned that they will be added to the list of categorical features.

Methods

discretise(dataset, numpy.void])

Discretises numerical features of the dataset into quartiles.

discretise(dataset: Union[numpy.ndarray, numpy.void]) → Union[numpy.ndarray, numpy.void][source]

Discretises numerical features of the dataset into quartiles.

Parameters
datasetUnion[numpy.ndarray, numpy.void]

A data point (1-D) or an array (2-D) of data points to be discretised.

Raises
——
IncorrectShapeError

The input dataset is neither 1- nor 2-dimensional numpy array. The number of features (columns) in the input dataset is different than the number of features in the dataset used to initialise this object.

TypeError

The dtype of the input dataset is too different from the dtype of the dataset used to initialise this object.

Returns
discretised_dataUnion[numpy.ndarray, numpy.void]

A discretised data array.