simpleml.pipelines.base_pipeline¶
Base Module for Pipelines
Module Contents¶
Classes¶
Abstract Base class for all Pipelines objects. |
|
Sequence wrapper for internal datasets. Only used for raw data mapping so |
|
Base class for all Pipeline objects. |
|
Nested sequence class to apply transforms on batches in real-time and forward |
-
class
simpleml.pipelines.base_pipeline.AbstractPipeline(has_external_files=True, transformers=None, external_pipeline_class='default', fitted=False, **kwargs)[source]¶ Bases:
future.utils.with_metaclass()Abstract Base class for all Pipelines objects.
Relies on mixin classes to define the split_dataset method. Will throw an error on use otherwise
params: pipeline parameter metadata for easy insight into hyperparameters across trainings
-
_create_external_pipeline(self, external_pipeline_class, transformers, **kwargs)[source]¶ should return the desired pipeline object
- Parameters
external_pipeline_class – str of class to use, can be ‘default’ or ‘sklearn’
-
_generator_transform(self, X, dataset_split=None, **kwargs)[source]¶ Pass through method to external pipeline
- Parameters
X – dataframe/matrix to transform, if None, use internal dataset
NOTE: Downstream objects expect to consume a generator with a tuple of X, y, other… not a Split object, so an ordered tuple will be returned
-
_hash(self)[source]¶ Hash is the combination of the: 1) Dataset 2) Transformers 3) Transformer Params 4) Pipeline Config
-
_iterate_split(self, split, infinite_loop=False, batch_size=32, shuffle=True, **kwargs)[source]¶ Turn a dataset split into a generator
-
_iterate_split_using_sequence(self, split, batch_size=32, shuffle=True, **kwargs)[source]¶ Different version of iterate split that uses a keras.utils.sequence object to play nice with keras and enable thread safe generation.
-
_sequence_transform(self, X, dataset_split=None, **kwargs)[source]¶ Pass through method to external pipeline
- Parameters
X – dataframe/matrix to transform, if None, use internal dataset
NOTE: Downstream objects expect to consume a sequence with a tuple of X, y, other… not a Split object, so an ordered tuple will be returned
-
_transform(self, X, dataset_split=None)[source]¶ Pass through method to external pipeline
- Parameters
X – dataframe/matrix to transform, if None, use internal dataset
- Return type
Split object if no dataset passed (X is Null) Otherwise matrix return of input X
-
property
external_pipeline(self)[source]¶ All pipeline objects are going to require some filebase persisted object
Wrapper around whatever underlying class is desired (eg sklearn or native)
-
fit_transform(self, **kwargs)[source]¶ Wrapper for fit and transform methods ASSUMES only applies to default (train) split
-
get_dataset_split(self, split=None, return_generator=False, return_sequence=False, **kwargs)[source]¶ Get specific dataset split Assumes a Split object (simpleml.pipelines.validation_split_mixins.Split) is returned. Inherit or implement similar expected attributes to replace
Uses internal self._dataset_splits as the split container - assumes dictionary like itemgetter
-
get_feature_names(self)[source]¶ Pass through method to external pipeline Should return a list of the final features generated by this pipeline
-
save(self, **kwargs)[source]¶ Extend parent function with a few additional save routines
save params
save transformer metadata
features
-
transform(self, X, return_generator=False, return_sequence=False, **kwargs)[source]¶ Main transform routine - routes to generator or regular method depending on the flag
- Parameters
return_generator – boolean, whether to use the transformation method
that returns a generator object or the regular transformed input :param return_sequence: boolean, whether to use method that returns a keras.utils.sequence object to play nice with keras models
-
-
class
simpleml.pipelines.base_pipeline.DatasetSequence(split, batch_size, shuffle)[source]¶ Bases:
simpleml.imports.SequenceSequence wrapper for internal datasets. Only used for raw data mapping so return type is internal Split object. Transformed sequences are used to conform with external input types (keras tuples)
-
__getitem__(self, index)[source]¶ Gets batch at position index. # Arguments
index: position of the batch in the Sequence.
- # Returns
A batch
-
-
class
simpleml.pipelines.base_pipeline.Pipeline(has_external_files=True, transformers=None, external_pipeline_class='default', fitted=False, **kwargs)[source]¶ Bases:
simpleml.pipelines.base_pipeline.AbstractPipelineBase class for all Pipeline objects.
dataset_id: foreign key relation to the dataset used as input
-
class
simpleml.pipelines.base_pipeline.TransformedSequence(pipeline, dataset_sequence)[source]¶ Bases:
simpleml.imports.SequenceNested sequence class to apply transforms on batches in real-time and forward through as the next batch