simpleml.datasets
¶
Import modules to register class names in global registry
Define convenience classes composed of different mixins
Submodules¶
Package Contents¶
Classes¶
Base class for all Dataset objects. |
|
Composed mixin class with numpy helper methods and a predefined build |
|
Assumes _external_file is a dictionary of numpy ndarrays |
|
Composed mixin class with pandas helper methods and a predefined build |
|
“Pandas”esque mixin class with control mechanism for self.dataframe of |
-
exception
simpleml.datasets.
DatasetError
(*args, **kwargs)[source]¶ Bases:
simpleml.utils.errors.SimpleMLError
Common base class for all non-exit exceptions.
Initialize self. See help(type(self)) for accurate signature.
-
class
simpleml.datasets.
Dataset
(has_external_files=True, label_columns=None, **kwargs)[source]¶ Bases:
simpleml.datasets.base_dataset.AbstractDataset
Base class for all Dataset objects.
pipeline_id: foreign key relation to the dataset pipeline used as input
-
__table_args__
¶
-
__tablename__
= datasets¶
-
pipeline
¶
-
pipeline_id
¶
-
-
class
simpleml.datasets.
NumpyDataset
(has_external_files=True, label_columns=None, **kwargs)[source]¶ Bases:
simpleml.datasets.base_dataset.Dataset
,simpleml.datasets.numpy_mixin.NumpyDatasetMixin
Composed mixin class with numpy helper methods and a predefined build routine, assuming dataset pipeline existence.
WARNING: this class will fail if build_dataframe is not overwritten or a pipeline provided!
-
class
simpleml.datasets.
NumpyDatasetMixin
[source]¶ Bases:
simpleml.datasets.abstract_mixin.AbstractDatasetMixin
Assumes _external_file is a dictionary of numpy ndarrays
-
property
X
(self)¶ Return the subset that isn’t in the target labels
-
get
(self, column, split)¶ Explicitly split validation splits Assumes self.dataframe has a get method to return a dictionary of {‘X’: X, ‘y’: y} Uses self.label_columns if y is named something else – only looks at first entry in list
returns None for any combination of column/split that isn’t present
-
get_feature_names
(self)¶ Should return a list of the features in the dataset
-
property
y
(self)¶ Return the target label columns
-
property
-
class
simpleml.datasets.
PandasDataset
(has_external_files=True, label_columns=None, **kwargs)[source]¶ Bases:
simpleml.datasets.base_dataset.Dataset
,simpleml.datasets.pandas_mixin.PandasDatasetMixin
Composed mixin class with pandas helper methods and a predefined build routine, assuming dataset pipeline existence.
WARNING: this class will fail if build_dataframe is not overwritten or a pipeline provided!
-
class
simpleml.datasets.
PandasDatasetMixin
[source]¶ Bases:
simpleml.datasets.abstract_mixin.AbstractDatasetMixin
“Pandas”esque mixin class with control mechanism for self.dataframe of type dataframe. Only assumes pandas syntax, not types, so should be compatible with pandas drop-in replacements.
- In particular:
- A - type of pd.DataFrame:
query()
columns
drop()
__getitem__()
squeeze()
- B - any other type:
get()
__getitem__()
squeeze(
-
property
X
(self)¶ Return the subset that isn’t in the target labels (across all potential splits)
-
concatenate_dataframes
(self, dataframes, split_names)¶ Helper method to merge dataframes into a single one with the split specified under DATAFRAME_SPLIT_COLUMN
-
get
(self, column, split)¶ Explicitly split validation splits Assumes self.dataframe has a get method to return the dataframe associated with the split Uses self.label_columns to separate x and y columns inside the returned dataframe
returns empty dataframe for missing combinations of column & split
-
get_feature_names
(self)¶ Should return a list of the features in the dataset
-
static
load_csv
(filename, **kwargs)¶ Helper method to read in a csv file
-
property
y
(self)¶ Return the target label columns