fatf.utils.data.feature_selection.sklearn
.forward_selection¶
-
fatf.utils.data.feature_selection.sklearn.
forward_selection
(dataset: numpy.ndarray, target: numpy.ndarray, weights: Optional[numpy.ndarray] = None, features_number: Optional[int] = None, features_percentage: int = 100) → numpy.ndarray[source]¶ Selects the specified number of features based on iterative importance.
New in version 0.1.0.
The
weights
provided as the input parameter are incorporated into the feature selection via the ridge regression training procedure. If the value offeature_percentage
results in selecting 0 features, 1 feature will be selected and a warning will be logged.Note
This feature selection procedure is computationally expensive when the number of features to be selected is large.
This feature selection method is based on LIME (Local Interpretable Model-agnostic Explanations). The original implementation can be found in the
lime.lime_base.LimeBase.forward_selection
method in the official LIME package.- Parameters
- datasetnumpy.ndarray
A 2-dimensional numpy array holding a data set.
- targetnumpy.ndarray
The class/probability/regression values of each row in the input data set.
- weightsnumpy.ndarray, optional (default=None)
An array of (importance) weights for each data point in the input data set. If
None
, all of the data points are treated equally important.- features_numberinteger, optional (default=None)
The number of (top) features to be selected. If
None
, the top x% of the features are selected where x is given by thefeatures_percentage
parameter.- features_percentageinteger, optional (default=100)
The percentage of (top) features to be selected. By default all of the features are returned if
features_number
isNone
.
- Returns
- feature_indicesnumpy.ndarray
Array with indices of features chosen with forward selection.
- Raises
- IncorrectShapeError
The
dataset
array is not 2-dimensional. Thetarget
array is not 1-dimensional. The number of elements in thetarget
array is different than the number of samples in thedataset
array. Theweights
array is not 1-dimensional. The number of weights in theweights
array does not agree with the number of samples in thedataset
array.- TypeError
One of the
dataset
,target
orweights
array is not purely numerical. Thefeatures_number
parameter is not an integer. Thefeatures_percentage
parameter is not an integer.- ValueError
The
features_number
parameter is not a positive integer. Thefeatures_percentage
parameter is outside of the allowed range 0–100 (inclusive).
- Warns
- UserWarning
The specified
features_number
is larger than the number of features in thedataset
array; all of the features are selected.