.. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code or run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_sphinx_gallery_auto_accountability_xmpl_accountability_data_measure.py: =================================================== Measuring Robustness of a Data Set -- Sampling Bias =================================================== This example illustrates how to identify Sampling Bias for a data set grouping for a selected feature. .. rst-class:: sphx-glr-script-out Out: .. code-block:: none The counts for groups defined on "petal length (cm)" feature (index 2) are: * For the population split *x <= 2.5* there are: 50 data points. * For the population split *2.5 < x <= 4.75* there are: 45 data points. * For the population split *4.75 < x* there are: 55 data points. The Sampling Bias for *petal length (cm)* feature (index 2) grouping is: * For "x <= 2.5" and "2.5 < x <= 4.75" groupings there >is no< Sampling Bias. * For "x <= 2.5" and "4.75 < x" groupings there >is no< Sampling Bias. * For "2.5 < x <= 4.75" and "4.75 < x" groupings there >is< Sampling Bias. | .. code-block:: default # Author: Kacper Sokol # License: new BSD import fatf.utils.data.datasets as fatf_datasets import fatf.accountability.data.measures as fatf_dam print(__doc__) # Load data iris_data_dict = fatf_datasets.load_iris() iris_X = iris_data_dict['data'] iris_y = iris_data_dict['target'].astype(int) iris_feature_names = iris_data_dict['feature_names'] iris_class_names = iris_data_dict['target_names'] # Select a feature for which the Sampling Bias be measured selected_feature_index = 2 selected_feature_name = iris_feature_names[selected_feature_index] # Define grouping on the selected feature selected_feature_grouping = [2.5, 4.75] # Get counts, weights and names of the specified grouping grp_counts, grp_weights, grp_names = fatf_dam.sampling_bias( iris_X, selected_feature_index, selected_feature_grouping) # Print out counts per group print('The counts for groups defined on "{}" feature (index {}) are:' ''.format(selected_feature_name, selected_feature_index)) for g_name, g_count in zip(grp_names, grp_counts): is_are = 'is' if g_count == 1 else 'are' print(' * For the population split *{}* there {}: ' '{} data points.'.format(g_name, is_are, g_count)) # Get the disparity grid bias_grid = fatf_dam.sampling_bias_grid_check(grp_counts) # Print out disparity per every grouping pair print('\nThe Sampling Bias for *{}* feature (index {}) grouping is:' ''.format(selected_feature_name, selected_feature_index)) for grouping_i, grouping_name_i in enumerate(grp_names): j_offset = grouping_i + 1 for grouping_j, grouping_name_j in enumerate(grp_names[j_offset:]): grouping_j += j_offset is_not = '' if bias_grid[grouping_i, grouping_j] else ' no' print(' * For "{}" and "{}" groupings there >is{}< Sampling Bias.' ''.format(grouping_name_i, grouping_name_j, is_not)) .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.112 seconds) .. _sphx_glr_download_sphinx_gallery_auto_accountability_xmpl_accountability_data_measure.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: binder-badge .. image:: https://mybinder.org/badge_logo.svg :target: https://mybinder.org/v2/gh/fat-forensics/fat-forensics-doc/master?filepath=notebooks/sphinx_gallery_auto/accountability/xmpl_accountability_data_measure.ipynb :width: 150 px .. container:: sphx-glr-download :download:`Download Python source code: xmpl_accountability_data_measure.py ` .. container:: sphx-glr-download :download:`Download Jupyter notebook: xmpl_accountability_data_measure.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_