simpleml.datasets package¶
Import modules to register class names in global registry
Define convenience classes composed of different mixins
-
class
simpleml.datasets.
NumpyDataset
(has_external_files=True, label_columns=None, **kwargs)[source]¶ Bases:
simpleml.datasets.base_dataset.Dataset
,simpleml.datasets.numpy_mixin.NumpyDatasetMixin
Composed mixin class with numpy helper methods and a predefined build routine, assuming dataset pipeline existence.
WARNING: this class will fail if build_dataframe is not overwritten or a pipeline provided!
-
build_dataframe
()[source]¶ Transform raw dataset via dataset pipeline for production ready dataset Overwrite this method to disable raw dataset requirement
-
created_timestamp
¶
-
filepaths
¶
-
has_external_files
¶
-
hash_
¶
-
id
¶
-
metadata_
¶
-
modified_timestamp
¶
-
name
¶
-
pipeline
¶
-
pipeline_id
¶
-
project
¶
-
registered_name
¶
-
version
¶
-
version_description
¶
-
-
class
simpleml.datasets.
PandasDataset
(has_external_files=True, label_columns=None, **kwargs)[source]¶ Bases:
simpleml.datasets.base_dataset.Dataset
,simpleml.datasets.pandas_mixin.PandasDatasetMixin
Composed mixin class with pandas helper methods and a predefined build routine, assuming dataset pipeline existence.
WARNING: this class will fail if build_dataframe is not overwritten or a pipeline provided!
-
build_dataframe
()[source]¶ Transform raw dataset via dataset pipeline for production ready dataset Overwrite this method to disable raw dataset requirement
-
created_timestamp
¶
-
filepaths
¶
-
has_external_files
¶
-
hash_
¶
-
id
¶
-
static
merge_split
(split)[source]¶ Helper method to merge all dataframes in a split object into a single df does a column-wise join ex: df1 = [A, B, C](4 rows) + df2 = [D, E, F](4 rows) returns: [A, B, C, D, E, F](4 rows)
-
metadata_
¶
-
modified_timestamp
¶
-
name
¶
-
pipeline
¶
-
pipeline_id
¶
-
project
¶
-
registered_name
¶
-
version
¶
-
version_description
¶
-