simpleml.datasets.base_dataset module¶

Base Module for Datasets

Two use cases:

Processed, or traditional datasets. In situations of clean,

representative data, this can be used directly for modeling purposes.

Otherwise, a raw dataset needs to be created first with a dataset pipeline

to transform it into the processed form.

class simpleml.datasets.base_dataset.AbstractDataset(has_external_files=True, label_columns=[], **kwargs)[source]¶

Bases: simpleml.persistables.base_persistable.Persistable, simpleml.persistables.saving.AllSaveMixin

Abstract Base class for all Dataset objects.

Every dataset has a “dataframe” object associated with it that is responsible for housing the data. The term dataframe is a bit of a misnomer since it does not need to be a pandas.DataFrame obejct.

Each dataframe can be subdivided by inheriting classes and mixins to support numerous representations: ex: y column for supervised

train/test/validation splits …

Datasets can be constructed from scratch or as derivatives of existing datasets. In the event of derivation, a pipeline must be specified to transform the original data

No additional columns

add_pipeline(pipeline)[source]¶: Setter method for dataset pipeline used

build_dataframe()[source]¶: Must set self._external_file Cant set as abstractmethod because of database lookup dependency

dataframe¶

label_columns¶: Keep column list for labels in metadata to persist through saving

load(**kwargs)[source]¶: Extend main load routine to load relationship class

object_type = 'DATASET'¶

save(**kwargs)[source]¶: Extend parent function with a few additional save routines

class simpleml.datasets.base_dataset.Dataset(has_external_files=True, label_columns=[], **kwargs)[source]¶

Bases: simpleml.datasets.base_dataset.AbstractDataset

Base class for all Dataset objects.

pipeline_id: foreign key relation to the dataset pipeline used as input

author¶

created_timestamp¶

filepaths¶

has_external_files¶

hash_¶

id¶

metadata_¶

modified_timestamp¶

name¶

pipeline¶

pipeline_id¶

project¶

registered_name¶

version¶

version_description¶