fatf.utils.data.feature_selection.sklearn.forward_selection

fatf.utils.data.feature_selection.sklearn.forward_selection(dataset: numpy.ndarray, target: numpy.ndarray, weights: Optional[numpy.ndarray] = None, features_number: Optional[int] = None, features_percentage: int = 100) → numpy.ndarray[source]

Selects the specified number of features based on iterative importance.

New in version 0.1.0.

The weights provided as the input parameter are incorporated into the feature selection via the ridge regression training procedure. If the value of feature_percentage results in selecting 0 features, 1 feature will be selected and a warning will be logged.

Note

This feature selection procedure is computationally expensive when the number of features to be selected is large.

This feature selection method is based on LIME (Local Interpretable Model-agnostic Explanations). The original implementation can be found in the lime.lime_base.LimeBase.forward_selection method in the official LIME package.

Parameters
datasetnumpy.ndarray

A 2-dimensional numpy array holding a data set.

targetnumpy.ndarray

The class/probability/regression values of each row in the input data set.

weightsnumpy.ndarray, optional (default=None)

An array of (importance) weights for each data point in the input data set. If None, all of the data points are treated equally important.

features_numberinteger, optional (default=None)

The number of (top) features to be selected. If None, the top x% of the features are selected where x is given by the features_percentage parameter.

features_percentageinteger, optional (default=100)

The percentage of (top) features to be selected. By default all of the features are returned if features_number is None.

Returns
feature_indicesnumpy.ndarray

Array with indices of features chosen with forward selection.

Raises
IncorrectShapeError

The dataset array is not 2-dimensional. The target array is not 1-dimensional. The number of elements in the target array is different than the number of samples in the dataset array. The weights array is not 1-dimensional. The number of weights in the weights array does not agree with the number of samples in the dataset array.

TypeError

One of the dataset, target or weights array is not purely numerical. The features_number parameter is not an integer. The features_percentage parameter is not an integer.

ValueError

The features_number parameter is not a positive integer. The features_percentage parameter is outside of the allowed range 0–100 (inclusive).

Warns
UserWarning

The specified features_number is larger than the number of features in the dataset array; all of the features are selected.