fatf.utils.models.models
.KNN¶
-
class
fatf.utils.models.models.
KNN
(k: int = 3, mode: Optional[str] = None)[source]¶ A K-Nearest Neighbours model based on Euclidean distance.
When the
k
parameter is set to 0 the model works as a majority class classifier. In case the count of neighbours (withink
) results in a tie the overall majority class for the whole training data is returned. Finally, when the training data contains categorical (i.e. non-numerical, e.g. strings) columns the distance for these columns is 0 when the value matches and 1 otherwise.This model can operate in two modes: classifier or regressor. The first one works for categorical and numerical targets and provides two predictive methods:
predict
– for predicting labels andpredict_proba
for predicting probabilities of labels. The regressor mode, on the other hand, requires the target to be numerical and it only supports thepredict
method, which returns the average of the target value of thek
neighbours for the queried data point.- Parameters
- kinteger, optional (default=3)
The number of neighbours used to make a prediction. Defaults to 3.
- modestring, optional (default=’classifier’)
The mode in which the model will operate. Either
'classifier'
('c'
) or'regressor'
('r'
). In the latter casepredict_proba
method is disabled.
- Attributes
- _MODESSet[string]
Possible modes of the KNN model:
'classifier'
('c'
) or'regressor'
('r'
).- _kinteger
The number of neighbours used to make a prediction.
- _is_classifierboolean
True when the model is initialised (and operates) as a classifier. False when it acts as a regressor.
- _is_fittedboolean
A Boolean variable indicating whether the model is fitted.
- _Xnumpy.ndarray
The KNN model training data.
- _ynumpy.ndarray
The KNN model training labels.
- _X_ninteger
The number of data points in the training set.
- _unique_ynumpy.ndarray
An array with unique labels in the training labels set ordered lexicographically.
- _unique_y_countsnumpy.ndarray
An array with counts of the unique labels in the training labels set.
- _unique_y_probabilitiesnumpy.ndarray
Probabilities of labels calculated using their frequencies in the training data.
- _majority_labelUnion[string, integer, float]
The most common label in the training set.
- _is_structuredboolean
A Boolean variable indicating whether the model has been fitted on a structured numpy array.
- _categorical_indicesnumpy.ndarray
An array with categorical indices in the training array.
- _numerical_indicesnumpy.ndarray
An array with numerical indices in the training array.
- Raises
- PrefittedModelError
Raised when trying to fit a model that has already been fitted. Usually raised when calling the
fit
method for the second time. Try using theclear
method to reset the model before fitting it again.- TypeError
The
k
parameter is not an integer.- UnfittedModelError
Raised when trying to predict data with a model that has not been fitted yet. Try using the
fit
method to fit the model first.- ValueError
The
k
parameter is a negative number or themode
parameter does not have one of the allowed values:'c'
,'classifier'
,'r'
or'regressor'
.
Methods
clear
()Clears (unfits) the model.
fit
(X, y)Fits the model.
predict
(X)Predicts labels of new instances with the fitted model.
Calculates label probabilities for new instances with the fitted model.
-
clear
() → None[source]¶ Clears (unfits) the model.
- Raises
- UnfittedModelError
Raised when trying to clear a model that has not been fitted yet. Try using the fit method to
fit
the model first.
-
fit
(X: numpy.ndarray, y: numpy.ndarray) → None[source]¶ Fits the model.
- Parameters
- Xnumpy.ndarray
The KNN training data.
- ynumpy.ndarray
The KNN training labels.
- Raises
- IncorrectShapeError
Either the
X
array is not 2-dimensional, they
array is not 1-dimensional, the number of rows inX
is not the same as the number of elements iny
or theX
array has 0 rows or 0 columns.- PrefittedModelError
Trying to fit the model when it has already been fitted. Usually raised when calling the
fit
method for the second time without clearing the model first.- TypeError
Trying to fit a KNN predictor in a regressor mode with non-numerical target variable.
-
predict
(X: numpy.ndarray) → numpy.ndarray[source]¶ Predicts labels of new instances with the fitted model.
- Parameters
- Xnumpy.ndarray
The data for which labels will be predicted.
- Returns
- predictionsnumpy.ndarray
Predicted class labels for each data point.
- Raises
- IncorrectShapeError
X is not a 2-dimensional array, it has 0 rows or it has a different number of columns than the training data.
- UnfittedModelError
Raised when trying to predict data when the model has not been fitted yet. Try using the
fit
method to fit the model first.- ValueError
X has a different dtype than the data used to fit the model.
-
predict_proba
(X: numpy.ndarray) → numpy.ndarray[source]¶ Calculates label probabilities for new instances with the fitted model.
- Parameters
- Xnumpy.ndarray
The data for which labels probabilities will be predicted.
- Returns
- probabilitiesnumpy.ndarray
Probabilities of each instance belonging to every class. The labels in the return array are ordered by lexicographic order.
- Raises
- IncorrectShapeError
X is not a 2-dimensional array, it has 0 rows or it has a different number of columns than the training data.
- UnfittedModelError
Raised when trying to predict data when the model has not been fitted yet. Try using the
fit
method to fit the model first.- RuntimeError
Raised when trying to use this method when the predictor is initialised as a regressor.
- ValueError
X has a different dtype than the data used to fit the model.