fatf.utils.models.models.KNN

class fatf.utils.models.models.KNN(k=3, mode=None)[source]

A K-Nearest Neighbours model based on Euclidean distance.

When the k parameter is set to 0 the model works as a majority class classifier. In case the count of neighbours (within k) results in a tie the overall majority class for the whole training data is returned. Finally, when the training data contains categorical (i.e. non-numerical, e.g. strings) columns the distance for these columns is 0 when the value matches and 1 otherwise.

This model can operate in two modes: classifier or regressor. The first one works for categorical and numerical targets and provides two predictive methods: predict – for predicting labels and predict_proba for predicting probabilities of labels. The regressor mode, on the other hand, requires the target to be numerical and it only supports the predict method, which returns the average of the target value of the k neighbours for the queried data point.

Parameters
kinteger, optional (default=3)

The number of neighbours used to make a prediction. Defaults to 3.

modestring, optional (default=’classifier’)

The mode in which the model will operate. Either 'classifier' ('c') or 'regressor' ('r'). In the latter case predict_proba method is disabled.

Attributes
_MODESSet[string]

Possible modes of the KNN model: 'classifier' ('c') or 'regressor' ('r').

_kinteger

The number of neighbours used to make a prediction.

_is_classifierboolean

True when the model is initialised (and operates) as a classifier. False when it acts as a regressor.

_is_fittedboolean

A Boolean variable indicating whether the model is fitted.

_Xnumpy.ndarray

The KNN model training data.

_ynumpy.ndarray

The KNN model training labels.

_X_ninteger

The number of data points in the training set.

_unique_ynumpy.ndarray

An array with unique labels in the training labels set ordered lexicographically.

_unique_y_countsnumpy.ndarray

An array with counts of the unique labels in the training labels set.

_unique_y_probabilitiesnumpy.ndarray

Probabilities of labels calculated using their frequencies in the training data.

_majority_labelUnion[string, integer, float]

The most common label in the training set.

_is_structuredboolean

A Boolean variable indicating whether the model has been fitted on a structured numpy array.

_categorical_indicesnumpy.ndarray

An array with categorical indices in the training array.

_numerical_indicesnumpy.ndarray

An array with numerical indices in the training array.

Raises
PrefittedModelError

Raised when trying to fit a model that has already been fitted. Usually raised when calling the fit method for the second time. Try using the clear method to reset the model before fitting it again.

TypeError

The k parameter is not an integer.

UnfittedModelError

Raised when trying to predict data with a model that has not been fitted yet. Try using the fit method to fit the model first.

ValueError

The k parameter is a negative number or the mode parameter does not have one of the allowed values: 'c', 'classifier', 'r' or 'regressor'.

Methods

clear()

Clears (unfits) the model.

fit(X, y)

Fits the model.

predict(X)

Predicts labels of new instances with the fitted model.

predict_proba(X)

Calculates label probabilities for new instances with the fitted model.

clear()[source]

Clears (unfits) the model.

Raises
UnfittedModelError

Raised when trying to clear a model that has not been fitted yet. Try using the fit method to fit the model first.

fit(X, y)[source]

Fits the model.

Parameters
Xnumpy.ndarray

The KNN training data.

ynumpy.ndarray

The KNN training labels.

Raises
IncorrectShapeError

Either the X array is not 2-dimensional, the y array is not 1-dimensional, the number of rows in X is not the same as the number of elements in y or the X array has 0 rows or 0 columns.

PrefittedModelError

Trying to fit the model when it has already been fitted. Usually raised when calling the fit method for the second time without clearing the model first.

TypeError

Trying to fit a KNN predictor in a regressor mode with non-numerical target variable.

predict(X)[source]

Predicts labels of new instances with the fitted model.

Parameters
Xnumpy.ndarray

The data for which labels will be predicted.

Returns
predictionsnumpy.ndarray

Predicted class labels for each data point.

Raises
IncorrectShapeError

X is not a 2-dimensional array, it has 0 rows or it has a different number of columns than the training data.

UnfittedModelError

Raised when trying to predict data when the model has not been fitted yet. Try using the fit method to fit the model first.

ValueError

X has a different dtype than the data used to fit the model.

predict_proba(X)[source]

Calculates label probabilities for new instances with the fitted model.

Parameters
Xnumpy.ndarray

The data for which labels probabilities will be predicted.

Returns
probabilitiesnumpy.ndarray

Probabilities of each instance belonging to every class. The labels in the return array are ordered by lexicographic order.

Raises
IncorrectShapeError

X is not a 2-dimensional array, it has 0 rows or it has a different number of columns than the training data.

UnfittedModelError

Raised when trying to predict data when the model has not been fitted yet. Try using the fit method to fit the model first.

RuntimeError

Raised when trying to use this method when the predictor is initialised as a regressor.

ValueError

X has a different dtype than the data used to fit the model.