`fatf.utils.models.models`.KNN¶

class fatf.utils.models.models.KNN(k: int = 3, mode: Optional[str] = None)[source]¶

A K-Nearest Neighbours model based on Euclidean distance.

When the k parameter is set to 0 the model works as a majority class classifier. In case the count of neighbours (within k) results in a tie the overall majority class for the whole training data is returned. Finally, when the training data contains categorical (i.e. non-numerical, e.g. strings) columns the distance for these columns is 0 when the value matches and 1 otherwise.

This model can operate in two modes: classifier or regressor. The first one works for categorical and numerical targets and provides two predictive methods: predict – for predicting labels and predict_proba for predicting probabilities of labels. The regressor mode, on the other hand, requires the target to be numerical and it only supports the predict method, which returns the average of the target value of the k neighbours for the queried data point.

Parameters

kinteger, optional (default=3): The number of neighbours used to make a prediction. Defaults to 3.
modestring, optional (default=’classifier’): The mode in which the model will operate. Either 'classifier' ('c') or 'regressor' ('r'). In the latter case predict_proba method is disabled.

Attributes

_MODESSet[string]: Possible modes of the KNN model: 'classifier' ('c') or 'regressor' ('r').
_kinteger: The number of neighbours used to make a prediction.
_is_classifierboolean: True when the model is initialised (and operates) as a classifier. False when it acts as a regressor.
_is_fittedboolean: A Boolean variable indicating whether the model is fitted.
_Xnumpy.ndarray: The KNN model training data.
_ynumpy.ndarray: The KNN model training labels.
_X_ninteger: The number of data points in the training set.
_unique_ynumpy.ndarray: An array with unique labels in the training labels set ordered lexicographically.
_unique_y_countsnumpy.ndarray: An array with counts of the unique labels in the training labels set.
_unique_y_probabilitiesnumpy.ndarray: Probabilities of labels calculated using their frequencies in the training data.
_majority_labelUnion[string, integer, float]: The most common label in the training set.
_is_structuredboolean: A Boolean variable indicating whether the model has been fitted on a structured numpy array.
_categorical_indicesnumpy.ndarray: An array with categorical indices in the training array.
_numerical_indicesnumpy.ndarray: An array with numerical indices in the training array.

Raises

PrefittedModelError: Raised when trying to fit a model that has already been fitted. Usually raised when calling the fit method for the second time. Try using the clear method to reset the model before fitting it again.
TypeError: The k parameter is not an integer.
UnfittedModelError: Raised when trying to predict data with a model that has not been fitted yet. Try using the fit method to fit the model first.
ValueError: The k parameter is a negative number or the mode parameter does not have one of the allowed values: 'c', 'classifier', 'r' or 'regressor'.

Methods

`clear`()	Clears (unfits) the model.
`fit`(X, y)	Fits the model.
`predict`(X)	Predicts labels of new instances with the fitted model.
`predict_proba`(X)	Calculates label probabilities for new instances with the fitted model.

clear() → None[source]¶

Clears (unfits) the model.

Raises

UnfittedModelError: Raised when trying to clear a model that has not been fitted yet. Try using the fit method to fit the model first.

fit(X: numpy.ndarray, y: numpy.ndarray) → None[source]¶

Fits the model.

Parameters

Xnumpy.ndarray: The KNN training data.
ynumpy.ndarray: The KNN training labels.

Raises

IncorrectShapeError: Either the X array is not 2-dimensional, the y array is not 1-dimensional, the number of rows in X is not the same as the number of elements in y or the X array has 0 rows or 0 columns.
PrefittedModelError: Trying to fit the model when it has already been fitted. Usually raised when calling the fit method for the second time without clearing the model first.
TypeError: Trying to fit a KNN predictor in a regressor mode with non-numerical target variable.

predict(X: numpy.ndarray) → numpy.ndarray[source]¶

Predicts labels of new instances with the fitted model.

Parameters

Xnumpy.ndarray: The data for which labels will be predicted.

Returns

predictionsnumpy.ndarray: Predicted class labels for each data point.

Raises

IncorrectShapeError: X is not a 2-dimensional array, it has 0 rows or it has a different number of columns than the training data.
UnfittedModelError: Raised when trying to predict data when the model has not been fitted yet. Try using the fit method to fit the model first.
ValueError: X has a different dtype than the data used to fit the model.

predict_proba(X: numpy.ndarray) → numpy.ndarray[source]¶

Calculates label probabilities for new instances with the fitted model.

Parameters

Xnumpy.ndarray: The data for which labels probabilities will be predicted.

Returns

probabilitiesnumpy.ndarray: Probabilities of each instance belonging to every class. The labels in the return array are ordered by lexicographic order.

Raises

IncorrectShapeError: X is not a 2-dimensional array, it has 0 rows or it has a different number of columns than the training data.
UnfittedModelError: Raised when trying to predict data when the model has not been fitted yet. Try using the fit method to fit the model first.
RuntimeError: Raised when trying to use this method when the predictor is initialised as a regressor.
ValueError: X has a different dtype than the data used to fit the model.

fatf.utils.models.models.KNN¶

`fatf.utils.models.models`.KNN¶