simpleml.pipelines.validation_split_mixins
Module for different pipeline split methods for cross validation
No Split – Just use all the data - hardcoded as the default for all pipelines
Explicit Split – dataset class defines the split
Percentage – random split support for train, validation, test
Chronological – time based split support for train, validation, test
KFold
Module Contents
Classes
TBD on how to implement this. KFold requires K models and unique datasets |
|
Class to randomly split dataset into different sets |
|
Attributes
- class simpleml.pipelines.validation_split_mixins.ChronologicalSplitMixin(**kwargs)[source]
Bases:
SplitMixin
- class simpleml.pipelines.validation_split_mixins.ExplicitSplitMixin[source]
Bases:
SplitMixin
- class simpleml.pipelines.validation_split_mixins.KFoldSplitMixin[source]
Bases:
SplitMixin
TBD on how to implement this. KFold requires K models and unique datasets so may be easier to wrap a parallelized implementation that internally creates K new Pipeline and Model objects
- class simpleml.pipelines.validation_split_mixins.RandomSplitMixin(train_size, test_size=None, validation_size=0.0, random_state=123, shuffle=True, **kwargs)[source]
Bases:
SplitMixin
Class to randomly split dataset into different sets
Redefines splits so custom named splits in dataset cannot be referenced by the same names. Only TRAIN/TEST/VALIDATION
Set splitting params: By default validation is 0.0 because it is only used for hyperparameter tuning
- Parameters