fatf.utils.data.feature_selection.sklearn.lasso_path¶
-
fatf.utils.data.feature_selection.sklearn.lasso_path(dataset: numpy.ndarray, target: numpy.ndarray, weights: Optional[numpy.ndarray] = None, features_number: Optional[int] = None, features_percentage: int = 100) → numpy.ndarray[source]¶ Selects the specified number of features based on Lasso path coefficients.
New in version 0.0.2.
It may be the case that the specified number of features cannot be selected as a lasso path does not give enough non-zero coefficients, in which case the biggest number of features (smaller than the specified number) will be returned. In case all of the features are assigned 0 weight or all of the paths have a non-zero number of coefficients larger than the specified number, all of the features are selected. If the exact number of features specified by the user cannot be selected an appropriate message will be logged. Also, if the value of
feature_percentageresults in selecting 0 features, 1 feature will be selected and a warning will be logged.The
weightsprovided as the input parameter are incorporated into the feature selection process by centering thedatasetaround their weighted average (if no weights are provided, the average is simply not weighted) and scaling by the square root of theweights. Thetargetarray is treated in the same way.This feature selection method is based on LIME (Local Interpretable Model-agnostic Explanations). The original implementation can be found in the
lime.lime_base.LimeBase.feature_selectionmethod in the official LIME package.- Parameters
- datasetnumpy.ndarray
A 2-dimensional numpy array holding a data set.
- targetnumpy.ndarray
The class/probability/regression values of each row in the input data set.
- weightsnumpy.ndarray, optional (default=None)
An array of (importance) weights for each data point in the input data set. If
None, all of the data points are equally important when computing the Lasso path.- features_numberinteger, optional (default=None)
The number of (top) features to be selected. If
None, the top x% of the features are selected where x is given by thefeatures_percentageparameter. It may be the case that the specified number of features cannot be extracted, in which case a warning is logged and the next biggest subset of features is selected.- features_percentageinteger, optional (default=100)
The percentage of (top) features to be selected. By default all of the features are returned if
features_numberisNone.
- Returns
- feature_indicesnumpy.ndarray
Array with indices of features selected by the Lasso path.
- Raises
- IncorrectShapeError
The
datasetarray is not 2-dimensional. Thetargetarray is not 1-dimensional. The number of elements in thetargetarray is different than the number of samples in thedatasetarray. Theweightsarray is not 1-dimensional. The number of weights in theweightsarray does not agree with the number of samples in thedatasetarray.- TypeError
One of the
dataset,targetorweightsarray is not purely numerical. Thefeatures_numberparameter is not an integer. Thefeatures_percentageparameter is not an integer.- ValueError
The
features_numberparameter is not a positive integer. Thefeatures_percentageparameter is outside of the allowed range 0–100 (inclusive).
- Warns
- UserWarning
The specified
features_numberis larger than the number of features in thedatasetarray; all of the features are selected.