simpleml.datasets.numpy.base
Numpy Module for external “dataframe”
Inherit and extend for particular patterns. It is a bit of a misnomer to use the term “dataframe”, since there are very few expected attributes and they are by no means unique to pandas.
Module Contents
Classes
Assumes self.dataframe is a dictionary of numpy ndarrays |
Attributes
- class simpleml.datasets.numpy.base.BaseNumpyDataset(*args, **kwargs)[source]
Bases:
simpleml.datasets.base_dataset.Dataset
Assumes self.dataframe is a dictionary of numpy ndarrays
param label_columns: Optional list of column names to register as the “y” split section param other_named_split_sections: Optional map of section names to lists of column names for
other arbitrary split columns – must match expected consumer signatures (e.g. sample_weights) because passed through untouched downstream (eg sklearn.fit(**split))
All other columns in the dataframe will automatically be referenced as “X”
- property X(self)[source]
Return the subset that isn’t in the target labels
- Return type
numpy.ndarray
- get(self, column, split)[source]
Explicitly split validation splits Assumes self.dataframe has a get method to return a dictionary of {‘X’: X, ‘y’: y} Uses self.label_columns if y is named something else – only looks at first entry in list
returns None for any combination of column/split that isn’t present
- get_feature_names(self)[source]
Should return a list of the features in the dataset
- Return type
List[str]