fatf.utils.data.feature_selection.sklearn
.lasso_path¶

fatf.utils.data.feature_selection.sklearn.
lasso_path
(dataset: numpy.ndarray, target: numpy.ndarray, weights: Optional[numpy.ndarray] = None, features_number: Optional[int] = None, features_percentage: int = 100) → List[Union[str, int]][source]¶ Selects the specified number of features based on Lasso path coefficients.
New in version 0.0.2.
It may be the case that the specified number of features cannot be selected as a lasso path does not give enough nonzero coefficients, in which case the biggest number of features (smaller than the specified number) will be returned. In case all of the features are assigned 0 weight or all of the paths have a nonzero number of coefficients larger than the specified number, all of the features are selected. If the exact number of features specified by the user cannot be selected an appropriate message will be logged. Also, if the value of
feature_percentage
results in selecting 0 features, 1 feature will be selected and a warning will be logged.The
weights
provided as the input parameter are incorporated into the feature selection process by centering thedataset
around their weighted average (if no weights are provided, the average is simply not weighted) and scaling by the square root of theweights
. Thetarget
array is treated in the same way.This feature selection method is based on the default feature selection mechanism implemented by LIME (Local Interpretable Modelagnostic Explanations. The original implementation can be found in the
lime.lime_base.LimeBase.feature_selection
method in the official LIME package. Parameters
 datasetnumpy.ndarray
A 2dimensional numpy array with holding a data set.
 targetnumpy.ndarray
The class/probabilities/regression values of each row in the input data set.
 weightsnumpy.ndarray, optional (default=None)
An array of (importance) weights for each data point in the input data set. If
None
, all of the data points are the same important when computing the Lasso path. features_numberinteger, optional (default=None)
The number of (top) features to be selected. If
None
, the top x% of the features are selected where x is given by thefeatures_percentage
parameter. It may be the case that exactly the exact number of features cannot be extracted in which case a warning will be logged and the next biggest subset of features will be selected. features_percentageinteger, optional (default=100)
The percentage of (top) features to be selected. By default all of the features are returned if
features_number
isNone
.
 Returns
 feature_indicesList[Index]
List of indices indicating features selected by the Lasso path.
 Raises
 IncorrectShapeError
The
dataset
array is not 2dimensional. Thetarget
array is not 1dimensional. The number of labels in thetarget
array is different than the number of samples in thedataset
array. Theweights
array is not 1dimensional. The number of weights in theweights
array does not agree with the number of samples in thedataset
array. TypeError
The one of the
dataset
,target
orweights
array is not purely numerical. Thefeatures_number
parameter is not an integer. Thefeatures_percentage
parameter is not an integer. ValueError
The
features_number
parameter is not a positive integer. Thefeatures_percentage
parameter is outside of the allowed range 0–100 (inclusive).
 Warns
 UserWarning
The specified
features_number
is larger than the number of features in thedataset
array; all of the features are selected.