simpleml.datasets.numpy
Dataset Library support for Numpy
Submodules
Package Contents
Classes
Assumes self.dataframe is a dictionary of numpy ndarrays |
|
Dataset class with a predefined build |
Attributes
- class simpleml.datasets.numpy.BaseNumpyDataset(*args, **kwargs)[source]
Bases:
simpleml.datasets.base_dataset.Dataset
Assumes self.dataframe is a dictionary of numpy ndarrays
param label_columns: Optional list of column names to register as the “y” split section param other_named_split_sections: Optional map of section names to lists of column names for
other arbitrary split columns – must match expected consumer signatures (e.g. sample_weights) because passed through untouched downstream (eg sklearn.fit(**split))
All other columns in the dataframe will automatically be referenced as “X”
- property X(self)
Return the subset that isn’t in the target labels
- Return type
numpy.ndarray
- get(self, column, split)
Explicitly split validation splits Assumes self.dataframe has a get method to return a dictionary of {‘X’: X, ‘y’: y} Uses self.label_columns if y is named something else – only looks at first entry in list
returns None for any combination of column/split that isn’t present
- property y(self)
Return the target label columns
- Return type
numpy.ndarray
- class simpleml.datasets.numpy.NumpyPipelineDataset(*args, **kwargs)[source]
Bases:
simpleml.datasets.numpy.base.BaseNumpyDataset
Dataset class with a predefined build routine, assuming dataset pipeline existence.
WARNING: this class will fail if build_dataframe is not overwritten or a pipeline provided!
param label_columns: Optional list of column names to register as the “y” split section param other_named_split_sections: Optional map of section names to lists of column names for
other arbitrary split columns – must match expected consumer signatures (e.g. sample_weights) because passed through untouched downstream (eg sklearn.fit(**split))
All other columns in the dataframe will automatically be referenced as “X”
- build_dataframe(self)
Transform raw dataset via dataset pipeline for production ready dataset Overwrite this method to disable raw dataset requirement
- Return type
None