fatf.utils.data.feature_selection.sklearn
.highest_weights¶
-
fatf.utils.data.feature_selection.sklearn.
highest_weights
(dataset: numpy.ndarray, target: numpy.ndarray, weights: Optional[numpy.ndarray] = None, features_number: Optional[int] = None, features_percentage: int = 100) → numpy.ndarray[source]¶ Selects the specified number of features based on their absolute weight.
New in version 0.1.0.
This feature selection procedure chooses the user-specified number of features based on their highest absolute weight given by a ridge regression fitted to all the features.
The
weights
provided as the input parameter are incorporated into the feature selection via the ridge regression training procedure. If the value offeature_percentage
results in selecting 0 features, 1 feature will be selected and a warning will be logged.This feature selection method is based on LIME (Local Interpretable Model-agnostic Explanations). The original implementation can be found in the
lime.lime_base.LimeBase.feature_selection
method in the official LIME package.- Parameters
- datasetnumpy.ndarray
A 2-dimensional numpy array holding a data set.
- targetnumpy.ndarray
The class/probability/regression values of each row in the input data set.
- weightsnumpy.ndarray, optional (default=None)
An array of (importance) weights for each data point in the input data set. If
None
, all of the data points are treated equally important.- features_numberinteger, optional (default=None)
The number of (top) features to be selected. If
None
, the top x% of the features are selected where x is given by thefeatures_percentage
parameter.- features_percentageinteger, optional (default=100)
The percentage of (top) features to be selected. By default all of the features are returned if
features_number
isNone
.
- Returns
- feature_indicesnumpy.ndarray
Array with indices of features with highest coefficients.
- Raises
- IncorrectShapeError
The
dataset
array is not 2-dimensional. Thetarget
array is not 1-dimensional. The number of elements in thetarget
array is different than the number of samples in thedataset
array. Theweights
array is not 1-dimensional. The number of weights in theweights
array does not agree with the number of samples in thedataset
array.- TypeError
One of the
dataset
,target
orweights
array is not purely numerical. Thefeatures_number
parameter is not an integer. Thefeatures_percentage
parameter is not an integer.- ValueError
The
features_number
parameter is not a positive integer. Thefeatures_percentage
parameter is outside of the allowed range 0–100 (inclusive).
- Warns
- UserWarning
The specified
features_number
is larger than the number of features in thedataset
array; all of the features are selected.