simpleml.datasets.processed_datasets.base_processed_dataset module

class simpleml.datasets.processed_datasets.base_processed_dataset.AbstractBaseProcessedDataset(has_external_files=True, **kwargs)[source]

Bases: simpleml.datasets.base_dataset.BaseDataset

Abstract Base class for all Processed Dataset objects.

add_pipeline(pipeline)[source]

Setter method for dataset pipeline used

load(**kwargs)[source]

Extend main load routine to load relationship class

save(**kwargs)[source]

Extend parent function with a few additional save routines

class simpleml.datasets.processed_datasets.base_processed_dataset.BaseNumpyProcessedDataset(has_external_files=True, **kwargs)[source]

Bases: simpleml.datasets.processed_datasets.base_processed_dataset.BaseProcessedDataset, simpleml.datasets.numpy_mixin.NumpyDatasetMixin

author
build_dataframe()[source]

Transform raw dataset via dataset pipeline for production ready dataset Overwrite this method to disable raw dataset requirement

created_timestamp
filepaths
has_external_files
hash_
id
metadata_
modified_timestamp
name
pipeline
pipeline_id
project
registered_name
version
version_description
class simpleml.datasets.processed_datasets.base_processed_dataset.BasePandasProcessedDataset(has_external_files=True, **kwargs)[source]

Bases: simpleml.datasets.processed_datasets.base_processed_dataset.BaseProcessedDataset, simpleml.datasets.pandas_mixin.PandasDatasetMixin

author
build_dataframe()[source]

Transform raw dataset via dataset pipeline for production ready dataset Overwrite this method to disable raw dataset requirement

created_timestamp
filepaths
has_external_files
hash_
id
metadata_
modified_timestamp
name
pipeline
pipeline_id
project
registered_name
version
version_description
class simpleml.datasets.processed_datasets.base_processed_dataset.BaseProcessedDataset(has_external_files=True, **kwargs)[source]

Bases: simpleml.datasets.processed_datasets.base_processed_dataset.AbstractBaseProcessedDataset

Base class for all Processed Dataset objects.

pipeline_id: foreign key relation to the dataset pipeline used as input

author
created_timestamp
filepaths
has_external_files
hash_
id
metadata_
modified_timestamp
name
pipeline
pipeline_id
project
registered_name
version
version_description