simpleml.pipelines.base_pipeline
¶
Base Module for Pipelines
Module Contents¶
Classes¶
Abstract Base class for all Pipelines objects. |
|
Sequence wrapper for internal datasets. Only used for raw data mapping so |
|
Base class for all Pipeline objects. |
|
Nested sequence class to apply transforms on batches in real-time and forward |
-
class
simpleml.pipelines.base_pipeline.
AbstractPipeline
(has_external_files=True, transformers=None, external_pipeline_class='default', fitted=False, **kwargs)[source]¶ Bases:
future.utils.with_metaclass()
Abstract Base class for all Pipelines objects.
Relies on mixin classes to define the split_dataset method. Will throw an error on use otherwise
params: pipeline parameter metadata for easy insight into hyperparameters across trainings
-
_create_external_pipeline
(self, external_pipeline_class, transformers, **kwargs)[source]¶ should return the desired pipeline object
- Parameters
external_pipeline_class – str of class to use, can be ‘default’ or ‘sklearn’
-
_generator_transform
(self, X, dataset_split=None, **kwargs)[source]¶ Pass through method to external pipeline
- Parameters
X – dataframe/matrix to transform, if None, use internal dataset
NOTE: Downstream objects expect to consume a generator with a tuple of X, y, other… not a Split object, so an ordered tuple will be returned
-
_hash
(self)[source]¶ Hash is the combination of the: 1) Dataset 2) Transformers 3) Transformer Params 4) Pipeline Config
-
_iterate_split
(self, split, infinite_loop=False, batch_size=32, shuffle=True, **kwargs)[source]¶ Turn a dataset split into a generator
-
_iterate_split_using_sequence
(self, split, batch_size=32, shuffle=True, **kwargs)[source]¶ Different version of iterate split that uses a keras.utils.sequence object to play nice with keras and enable thread safe generation.
-
_sequence_transform
(self, X, dataset_split=None, **kwargs)[source]¶ Pass through method to external pipeline
- Parameters
X – dataframe/matrix to transform, if None, use internal dataset
NOTE: Downstream objects expect to consume a sequence with a tuple of X, y, other… not a Split object, so an ordered tuple will be returned
-
_transform
(self, X, dataset_split=None)[source]¶ Pass through method to external pipeline
- Parameters
X – dataframe/matrix to transform, if None, use internal dataset
- Return type
Split object if no dataset passed (X is Null) Otherwise matrix return of input X
-
property
external_pipeline
(self)[source]¶ All pipeline objects are going to require some filebase persisted object
Wrapper around whatever underlying class is desired (eg sklearn or native)
-
fit_transform
(self, **kwargs)[source]¶ Wrapper for fit and transform methods ASSUMES only applies to default (train) split
-
get_dataset_split
(self, split=None, return_generator=False, return_sequence=False, **kwargs)[source]¶ Get specific dataset split Assumes a Split object (simpleml.pipelines.validation_split_mixins.Split) is returned. Inherit or implement similar expected attributes to replace
Uses internal self._dataset_splits as the split container - assumes dictionary like itemgetter
-
get_feature_names
(self)[source]¶ Pass through method to external pipeline Should return a list of the final features generated by this pipeline
-
save
(self, **kwargs)[source]¶ Extend parent function with a few additional save routines
save params
save transformer metadata
features
-
transform
(self, X, return_generator=False, return_sequence=False, **kwargs)[source]¶ Main transform routine - routes to generator or regular method depending on the flag
- Parameters
return_generator – boolean, whether to use the transformation method
that returns a generator object or the regular transformed input :param return_sequence: boolean, whether to use method that returns a keras.utils.sequence object to play nice with keras models
-
-
class
simpleml.pipelines.base_pipeline.
DatasetSequence
(split, batch_size, shuffle)[source]¶ Bases:
simpleml.imports.Sequence
Sequence wrapper for internal datasets. Only used for raw data mapping so return type is internal Split object. Transformed sequences are used to conform with external input types (keras tuples)
-
__getitem__
(self, index)[source]¶ Gets batch at position index. # Arguments
index: position of the batch in the Sequence.
- # Returns
A batch
-
-
class
simpleml.pipelines.base_pipeline.
Pipeline
(has_external_files=True, transformers=None, external_pipeline_class='default', fitted=False, **kwargs)[source]¶ Bases:
simpleml.pipelines.base_pipeline.AbstractPipeline
Base class for all Pipeline objects.
dataset_id: foreign key relation to the dataset used as input
-
class
simpleml.pipelines.base_pipeline.
TransformedSequence
(pipeline, dataset_sequence)[source]¶ Bases:
simpleml.imports.Sequence
Nested sequence class to apply transforms on batches in real-time and forward through as the next batch