simpleml.pipelines.validation_split_mixins module

Module for different split methods for cross validation

  1. No Split – Just use all the data
  2. Explicit Split – dataset class defines the split
  3. Percentage – random split support for train, validation, test
  4. Chronological – time based split support for train, validation, test
  5. KFold
class simpleml.pipelines.validation_split_mixins.ChronologicalSplitMixin(**kwargs)[source]

Bases: simpleml.pipelines.validation_split_mixins.SplitMixin

class simpleml.pipelines.validation_split_mixins.ExplicitSplitMixin[source]

Bases: simpleml.pipelines.validation_split_mixins.SplitMixin

split_dataset()[source]

Method to split the dataframe into different sets. Assumes dataset explicitly delineates between train, validation, and test

class simpleml.pipelines.validation_split_mixins.KFoldSplitMixin[source]

Bases: simpleml.pipelines.validation_split_mixins.SplitMixin

TBD on how to implement this. KFold requires K models and unique datasets so may be easier to wrap a parallelized implementation that internally creates K new Pipeline and Model objects

class simpleml.pipelines.validation_split_mixins.NoSplitMixin[source]

Bases: simpleml.pipelines.validation_split_mixins.SplitMixin

split_dataset()[source]

Method to split the dataframe into different sets. By default sets everything to TRAIN, but can be overwritten to add validation, test…

TODO: Work in support for generators (k-fold)

class simpleml.pipelines.validation_split_mixins.RandomSplitMixin(train_size, test_size=None, validation_size=0.0, random_state=123, shuffle=True, **kwargs)[source]

Bases: simpleml.pipelines.validation_split_mixins.SplitMixin

Class to randomly split dataset into different sets

split_dataset()[source]

Overwrite method to split by percentage

class simpleml.pipelines.validation_split_mixins.Split[source]

Bases: dict

Container class for splits

static is_null_type(obj)[source]

Helper to check for nulls - useful to not pass “empty” attributes so defaults of None will get returned downstream instead ex: **split -> all non null named params

squeeze()[source]

Helper method to clear up any null-type keys

class simpleml.pipelines.validation_split_mixins.SplitContainer(default_factory=<class 'simpleml.pipelines.validation_split_mixins.Split'>, **kwargs)[source]

Bases: collections.defaultdict

Explicit instantiation of a defaultdict returning split objects

class simpleml.pipelines.validation_split_mixins.SplitMixin[source]

Bases: object

containerize_split(split_dict)[source]
get_split_names()[source]
split_dataset()[source]

Set the split criteria

Must set self._dataset_splits