fatf.utils.data.feature_selection.sklearn
.lasso_path¶
-
fatf.utils.data.feature_selection.sklearn.
lasso_path
(dataset: numpy.ndarray, target: numpy.ndarray, weights: Optional[numpy.ndarray] = None, features_number: Optional[int] = None, features_percentage: int = 100) → numpy.ndarray[source]¶ Selects the specified number of features based on Lasso path coefficients.
New in version 0.0.2.
It may be the case that the specified number of features cannot be selected as a lasso path does not give enough non-zero coefficients, in which case the biggest number of features (smaller than the specified number) will be returned. In case all of the features are assigned 0 weight or all of the paths have a non-zero number of coefficients larger than the specified number, all of the features are selected. If the exact number of features specified by the user cannot be selected an appropriate message will be logged. Also, if the value of
feature_percentage
results in selecting 0 features, 1 feature will be selected and a warning will be logged.The
weights
provided as the input parameter are incorporated into the feature selection process by centering thedataset
around their weighted average (if no weights are provided, the average is simply not weighted) and scaling by the square root of theweights
. Thetarget
array is treated in the same way.This feature selection method is based on LIME (Local Interpretable Model-agnostic Explanations). The original implementation can be found in the
lime.lime_base.LimeBase.feature_selection
method in the official LIME package.- Parameters
- datasetnumpy.ndarray
A 2-dimensional numpy array holding a data set.
- targetnumpy.ndarray
The class/probability/regression values of each row in the input data set.
- weightsnumpy.ndarray, optional (default=None)
An array of (importance) weights for each data point in the input data set. If
None
, all of the data points are equally important when computing the Lasso path.- features_numberinteger, optional (default=None)
The number of (top) features to be selected. If
None
, the top x% of the features are selected where x is given by thefeatures_percentage
parameter. It may be the case that the specified number of features cannot be extracted, in which case a warning is logged and the next biggest subset of features is selected.- features_percentageinteger, optional (default=100)
The percentage of (top) features to be selected. By default all of the features are returned if
features_number
isNone
.
- Returns
- feature_indicesnumpy.ndarray
Array with indices of features selected by the Lasso path.
- Raises
- IncorrectShapeError
The
dataset
array is not 2-dimensional. Thetarget
array is not 1-dimensional. The number of elements in thetarget
array is different than the number of samples in thedataset
array. Theweights
array is not 1-dimensional. The number of weights in theweights
array does not agree with the number of samples in thedataset
array.- TypeError
One of the
dataset
,target
orweights
array is not purely numerical. Thefeatures_number
parameter is not an integer. Thefeatures_percentage
parameter is not an integer.- ValueError
The
features_number
parameter is not a positive integer. Thefeatures_percentage
parameter is outside of the allowed range 0–100 (inclusive).
- Warns
- UserWarning
The specified
features_number
is larger than the number of features in thedataset
array; all of the features are selected.