fatf.accountability.data.measures.sampling_bias_indexed

fatf.accountability.data.measures.sampling_bias_indexed(indices_per_bin: List[List[int]]) → Tuple[List[int], numpy.ndarray][source]

Computes information needed for evaluating and remedying sampling bias.

Computes the number of instances per sub-population based on the number of indices per sub-population and the weights that can be used for cost-sensitive learning to mitigate the sampling bias.

This is an alternative to fatf.accountability.data.measures.sampling_bias function, which can be used when one already has the desired instance binning.

For warnings and errors raised by this method please see the documentation of fatf.utils.data.tools.validate_indices_per_bin function.

Parameters
indices_per_binList[List[integer]]

A list of lists with the latter one holding row indices of a particular group (sub-population).

Returns
countsList[integers]

A number of data points for each sub-population defined by partitioning of the selected feature.

weightsnumpy.ndarray

A weight for every instance (that could be grouped, i.e. assigned to one of the sub-populations) in the input dataset. The weights are useful for training a cost-sensitive classifier to mitigate the sampling bias. The weights are inversely proportional to the number of instance occurrences for every sub-population.