fatf.utils.data.transformation.dataset_row_masking

fatf.utils.data.transformation.dataset_row_masking(dataset: numpy.ndarray, data_row: Union[numpy.ndarray, numpy.void]) → numpy.ndarray[source]

Creates a binary representation of the dataset by masking its rows.

New in version 0.0.2.

The rows of the dataset array are compared against specified data_row to determine which features values are the same and which are different. The same values are represented as 1 in the binary output and different ones are indicated by 0.

For a ['a', 'b'] data_row and [['x', 'b'], ['a', 'b'], ['a', 'x']] dataset the binary representation would be [[0, 1], [1, 1], [1, 0]].

Parameters
datasetnumpy.ndarray

A 2-dimensional numpy array used to generate the binary representation.

data_rowUnion[numpy.ndarray, numpy.void]

A 1-dimensional numpy array for unstructured arrays or numpy void for structured rows containing feature values that will be compared against the dataset rows.

Returns
binary_representationnumpy.ndarray

A binary (0’s and 1’s in an array of numpy.int8 type) representation of the dataset (with the same shape as dataset) achieved by “masking” it with the data_row.

Raises
IncorrectShapeError

The dataset is not a 2-dimensional array or data_row is not a 1-dimensional array. The length of the data_row is different to the number of columns in the dataset.

TypeError

The dataset is not of a base type or the data_row’s dtype is too different from the dataset’s dtype.