simpleml.datasets.base_dataset module

class simpleml.datasets.base_dataset.BaseDataset(has_external_files=True, **kwargs)[source]

Bases: simpleml.persistables.base_persistable.BasePersistable, simpleml.persistables.saving.AllSaveMixin

Base class for all Dataset objects.

Every dataset has one dataframe associated with it and can be subdivided by inheriting classes (y column for supervised, train/test/validation splits, etc)

Dataset storage is the final resulting dataframe so technically a dataset is uniquely determined by Dataset class + Dataset Pipeline

No additional columns

build_dataframe()[source]

Must set self._external_file Cant set as abstractmethod because of database lookup dependency

dataframe
label_columns

Keep column list for labels in metadata to persist through saving