fatf.utils.data.feature_selection.sklearn.forward_selection¶
-
fatf.utils.data.feature_selection.sklearn.forward_selection(dataset: numpy.ndarray, target: numpy.ndarray, weights: Optional[numpy.ndarray] = None, features_number: Optional[int] = None, features_percentage: int = 100) → numpy.ndarray[source]¶ Selects the specified number of features based on iterative importance.
New in version 0.1.0.
The
weightsprovided as the input parameter are incorporated into the feature selection via the ridge regression training procedure. If the value offeature_percentageresults in selecting 0 features, 1 feature will be selected and a warning will be logged.Note
This feature selection procedure is computationally expensive when the number of features to be selected is large.
This feature selection method is based on LIME (Local Interpretable Model-agnostic Explanations). The original implementation can be found in the
lime.lime_base.LimeBase.forward_selectionmethod in the official LIME package.- Parameters
- datasetnumpy.ndarray
A 2-dimensional numpy array holding a data set.
- targetnumpy.ndarray
The class/probability/regression values of each row in the input data set.
- weightsnumpy.ndarray, optional (default=None)
An array of (importance) weights for each data point in the input data set. If
None, all of the data points are treated equally important.- features_numberinteger, optional (default=None)
The number of (top) features to be selected. If
None, the top x% of the features are selected where x is given by thefeatures_percentageparameter.- features_percentageinteger, optional (default=100)
The percentage of (top) features to be selected. By default all of the features are returned if
features_numberisNone.
- Returns
- feature_indicesnumpy.ndarray
Array with indices of features chosen with forward selection.
- Raises
- IncorrectShapeError
The
datasetarray is not 2-dimensional. Thetargetarray is not 1-dimensional. The number of elements in thetargetarray is different than the number of samples in thedatasetarray. Theweightsarray is not 1-dimensional. The number of weights in theweightsarray does not agree with the number of samples in thedatasetarray.- TypeError
One of the
dataset,targetorweightsarray is not purely numerical. Thefeatures_numberparameter is not an integer. Thefeatures_percentageparameter is not an integer.- ValueError
The
features_numberparameter is not a positive integer. Thefeatures_percentageparameter is outside of the allowed range 0–100 (inclusive).
- Warns
- UserWarning
The specified
features_numberis larger than the number of features in thedatasetarray; all of the features are selected.