fatf.fairness.data.measures.systemic_bias

fatf.fairness.data.measures.systemic_bias(dataset: numpy.ndarray, ground_truth: numpy.ndarray, protected_features: List[Union[str, int]]) → numpy.ndarray[source]

Checks for systemic bias in a dataset.

This function checks whether there exist data points that share the same unprotected features but differ in protected features. For all of these instances their label (ground truth) will be checked and if it is different, a particular data points pair will be indicated to be biased. This dependency is represented as a boolean, square numpy array that shows whether systemic bias exists (True) for any pair of data points.

Parameters
datasetnumpy.ndarray

A dataset to be evaluated for systemic bias.

ground_truthnumpy.ndarray

The labels corresponding to the dataset.

protected_featuresList[column index]

A list of column indices in the dataset that hold protected attributes.

Returns
systemic_bias_matrixnumpy.ndarray

A square, diagonally symmetrical and boolean numpy array that indicates which pair of data point share the same unprotected features but differ in protected features and the ground truth annotation.

Raises
IncorrectShapeError

The dataset is not a 2-dimensional numpy array, the ground truth is not a 1-dimensional numpy array or the number of rows in the dataset is not equal to the number of elements in the ground truth array.

IndexError

Some of the column indices given in the protected_features list are not valid for the input dataset.

TypeError

The protected_features parameter is not a list.

ValueError

There are duplicate values in the protected feature indices list.

Examples using fatf.fairness.data.measures.systemic_bias