simpleml.datasets package

Import modules to register class names in global registry

Define convenience classes composed of different mixins

class simpleml.datasets.NumpyDataset(has_external_files=True, label_columns=[], **kwargs)[source]

Bases: simpleml.datasets.base_dataset.Dataset, simpleml.datasets.numpy_mixin.NumpyDatasetMixin

Composed mixin class with numpy helper methods and a predefined build routine, assuming dataset pipeline existence.

WARNING: this class will fail if build_dataframe is not overwritten or a pipeline provided!

author
build_dataframe()[source]

Transform raw dataset via dataset pipeline for production ready dataset Overwrite this method to disable raw dataset requirement

created_timestamp
filepaths
has_external_files
hash_
id
metadata_
modified_timestamp
name
pipeline
pipeline_id
project
registered_name
version
version_description
class simpleml.datasets.PandasDataset(has_external_files=True, label_columns=[], **kwargs)[source]

Bases: simpleml.datasets.base_dataset.Dataset, simpleml.datasets.pandas_mixin.PandasDatasetMixin

Composed mixin class with pandas helper methods and a predefined build routine, assuming dataset pipeline existence.

WARNING: this class will fail if build_dataframe is not overwritten or a pipeline provided!

author
build_dataframe()[source]

Transform raw dataset via dataset pipeline for production ready dataset Overwrite this method to disable raw dataset requirement

created_timestamp
filepaths
has_external_files
hash_
id
static merge_split(split)[source]

Helper method to merge all dataframes in a split object into a single df does a column-wise join ex: df1 = [A, B, C](4 rows) + df2 = [D, E, F](4 rows) returns: [A, B, C, D, E, F](4 rows)

metadata_
modified_timestamp
name
pipeline
pipeline_id
project
registered_name
version
version_description