simpleml.pipelines.base_pipeline module

Base Module for Pipelines

class simpleml.pipelines.base_pipeline.AbstractPipeline(has_external_files=True, transformers=[], external_pipeline_class='default', fitted=False, **kwargs)[source]

Bases: simpleml.persistables.base_persistable.Persistable, simpleml.persistables.saving.AllSaveMixin

Abstract Base class for all Pipelines objects.

Relies on mixin classes to define the split_dataset method. Will throw an error on use otherwise

params: pipeline parameter metadata for easy insight into hyperparameters across trainings

X(split=None)[source]

Get X for specific dataset split

add_dataset(dataset)[source]

Setter method for dataset used

add_transformer(name, transformer)[source]

Setter method for new transformer step

assert_dataset(msg='')[source]

Helper method to raise an error if dataset isn’t present

assert_fitted(msg='')[source]

Helper method to raise an error if pipeline isn’t fit

external_pipeline

All pipeline objects are going to require some filebase persisted object

Wrapper around whatever underlying class is desired (eg sklearn or native)

fit()[source]

Pass through method to external pipeline

fit_transform(**kwargs)[source]

Wrapper for fit and transform methods ASSUMES only applies to default (train) split

fitted
get_dataset_split(split=None, return_generator=False, return_sequence=False, **kwargs)[source]

Get specific dataset split Assumes a Split object (simpleml.pipelines.validation_split_mixins.Split) is returned. Inherit or implement similar expected attributes to replace

Uses internal self._dataset_splits as the split container - assumes dictionary like itemgetter

get_feature_names()[source]

Pass through method to external pipeline Should return a list of the final features generated by this pipeline

get_params(**kwargs)[source]

Pass through method to external pipeline

get_transformers()[source]

Pass through method to external pipeline

load(**kwargs)[source]

Extend main load routine to load relationship class

object_type = 'PIPELINE'
params = Column(None, JSON(), table=None, default=ColumnDefault({}))
remove_transformer(name)[source]

Delete method for transformer step

save(**kwargs)[source]

Extend parent function with a few additional save routines

  1. save params
  2. save transformer metadata
  3. features
set_params(**params)[source]

Pass through method to external pipeline

transform(X, return_generator=False, return_sequence=False, **kwargs)[source]

Main transform routine - routes to generator or regular method depending on the flag

Parameters:return_generator – boolean, whether to use the transformation method

that returns a generator object or the regular transformed input :param return_sequence: boolean, whether to use method that returns a keras.utils.sequence object to play nice with keras models

y(split=None)[source]

Get labels for specific dataset split

class simpleml.pipelines.base_pipeline.DatasetSequence(split, batch_size, shuffle)[source]

Bases: type

Sequence wrapper for internal datasets. Only used for raw data mapping so return type is internal Split object. Transformed sequences are used to conform with external input types (keras tuples)

on_epoch_end()[source]

Method called at the end of every epoch.

static validated_split(split)[source]

Confirms data is valid, otherwise returns None (makes downstream checking simpler)

class simpleml.pipelines.base_pipeline.Pipeline(has_external_files=True, transformers=[], external_pipeline_class='default', fitted=False, **kwargs)[source]

Bases: simpleml.pipelines.base_pipeline.AbstractPipeline

Base class for all Pipeline objects.

dataset_id: foreign key relation to the dataset used as input

author
created_timestamp
dataset
dataset_id
filepaths
has_external_files
hash_
id
metadata_
modified_timestamp
name
params
project
registered_name
version
version_description
class simpleml.pipelines.base_pipeline.TransformedSequence(pipeline, dataset_sequence)[source]

Bases: type

Nested sequence class to apply transforms on batches in real-time and forward through as the next batch

on_epoch_end()[source]