fatf.utils.data.datasets.load_data¶
- 
fatf.utils.data.datasets.load_data(file_path: str, dtype: Union[None, type, numpy.dtype, str, List[Tuple[str, str]], List[Tuple[str, numpy.dtype]]] = None, feature_names: List[str] = None) → Dict[str, numpy.ndarray][source]¶
- Loads a dataset from a file. - The dataset file must be formatted in the comma separated value (csv) standard with - ,used as the delimiter. The first row of the file must be a header formatted as follows:- n_samples,n_features,class_name_1,class_name_2,..., for example- 150,5,red,green,blue,blackindicates that there are 150 data points, with 5 features and 4 possible classes: red, green, blue and black. The classes should be given in an order that matches the lexicographical ordering of the unique class values. For example, given that the class values in the data are: 3, 2, 4 and 1 the assignment would be: 1–red, 2–green, 3–blue and 4–black. The rest of the csv file will be treated as a data array, with the last column being treated as the target (class) variable. The type of each column will be inferred if the- dtypeparameter is set to- None, otherwise the array will be cased into the provided dtype. In case the columns in the data are of different types or the user-provided dtype defines the columns to be of multiple types a structured numpy array is used to represent the data.- Parameters
- file_pathstring
- Path to the csv data file. 
- dtypeUnion[type, numpy.dtype, string, List[Tuple[string, string]], List[Tuple[string, type]], List[Tuple[string, numpy.dtype]]], optional (default=None)
- dtypes used to read the csv data. Defaults to None in which case the types will be inferred. The user can provide either a single type for the whole array (as a built-in Python type, numpy’s dtype or a string representation of a numpy’s dtype) or a list of tuples representing the name (string) and type (see above) of every column in the data array. In the latter case they user may choose to provide the list of types for the whole dataset, including the target column, or just the columns representing features. 
- feature_namesList[string]
- List of strings representing the feature names. Defaults to None in which case features are given default names (‘feature_0’, etc.) or if a structured - dtypeparameter is provided the names given in the- dtypeparameter are used.
 
- Returns
- dataDict[string, numpy.ndarray]
- A dictionary representation of the dataset storing all the relevant information under the following keys: ‘data’, ‘target’, ‘target_names’, ‘feature_names’. 
 
- Raises
- TypeError
- If provided, one of the feature names in the - feature_namesparameter is not a string; the- feature_namesparameter is neither of the allowed types (None or a list); the first element of one of the- dtypetuples is not a string or the- dtypeparameter is neither of the allowed types (None, a list of tuples, a built-in Python type, numpy’s dtype or a string representation of a numpy’s dtype).
- ValueError
- The number of feature names is inconsistent with the data header, the feature names are provided both in the - feature_namesand- dtypeparameters, a tuple in the list of complex- dtypes is malformatted, or the number of type definitions in the- dtypeparameter is inconsistent with the number of features in the dataset.