simpleml.datasets.processed_datasets.base_processed_dataset module¶

class simpleml.datasets.processed_datasets.base_processed_dataset.AbstractBaseProcessedDataset(has_external_files=True, **kwargs)[source]¶

Bases: simpleml.datasets.base_dataset.BaseDataset

Abstract Base class for all Processed Dataset objects.

add_pipeline(pipeline)[source]¶: Setter method for dataset pipeline used

load(**kwargs)[source]¶: Extend main load routine to load relationship class

save(**kwargs)[source]¶: Extend parent function with a few additional save routines

class simpleml.datasets.processed_datasets.base_processed_dataset.BaseNumpyProcessedDataset(has_external_files=True, **kwargs)[source]¶

Bases: simpleml.datasets.processed_datasets.base_processed_dataset.BaseProcessedDataset, simpleml.datasets.numpy_mixin.NumpyDatasetMixin

author¶

build_dataframe()[source]¶: Transform raw dataset via dataset pipeline for production ready dataset Overwrite this method to disable raw dataset requirement

created_timestamp¶

filepaths¶

has_external_files¶

hash_¶

id¶

metadata_¶

modified_timestamp¶

name¶

pipeline¶

pipeline_id¶

project¶

registered_name¶

version¶

version_description¶

class simpleml.datasets.processed_datasets.base_processed_dataset.BasePandasProcessedDataset(has_external_files=True, **kwargs)[source]¶

Bases: simpleml.datasets.processed_datasets.base_processed_dataset.BaseProcessedDataset, simpleml.datasets.pandas_mixin.PandasDatasetMixin

author¶

build_dataframe()[source]¶: Transform raw dataset via dataset pipeline for production ready dataset Overwrite this method to disable raw dataset requirement

created_timestamp¶

filepaths¶

has_external_files¶

hash_¶

id¶

metadata_¶

modified_timestamp¶

name¶

pipeline¶

pipeline_id¶

project¶

registered_name¶

version¶

version_description¶

class simpleml.datasets.processed_datasets.base_processed_dataset.BaseProcessedDataset(has_external_files=True, **kwargs)[source]¶

Bases: simpleml.datasets.processed_datasets.base_processed_dataset.AbstractBaseProcessedDataset

Base class for all Processed Dataset objects.

pipeline_id: foreign key relation to the dataset pipeline used as input

author¶

created_timestamp¶

filepaths¶

has_external_files¶

hash_¶

id¶

metadata_¶

modified_timestamp¶

name¶

pipeline¶

pipeline_id¶

project¶

registered_name¶

version¶

version_description¶