fatf.accountability.data.measures
.sampling_bias_indexed¶
-
fatf.accountability.data.measures.
sampling_bias_indexed
(indices_per_bin: List[List[int]]) → Tuple[List[int], numpy.ndarray][source]¶ Computes information needed for evaluating and remedying sampling bias.
Computes the number of instances per sub-population based on the number of indices per sub-population and the weights that can be used for cost-sensitive learning to mitigate the sampling bias.
This is an alternative to
fatf.accountability.data.measures.sampling_bias
function, which can be used when one already has the desired instance binning.For warnings and errors raised by this method please see the documentation of
fatf.utils.data.tools.validate_indices_per_bin
function.- Parameters
- indices_per_binList[List[integer]]
A list of lists with the latter one holding row indices of a particular group (sub-population).
- Returns
- countsList[integers]
A number of data points for each sub-population defined by partitioning of the selected feature.
- weightsnumpy.ndarray
A weight for every instance (that could be grouped, i.e. assigned to one of the sub-populations) in the input
dataset
. The weights are useful for training a cost-sensitive classifier to mitigate the sampling bias. The weights are inversely proportional to the number of instance occurrences for every sub-population.