simpleml.pipelines

Import modules to register class names in global registry

Define convenience classes composed of different mixins

Subpackages

Submodules

Package Contents

Classes

Pipeline

Abstract Base class for all Pipelines objects.

Attributes

__author__

simpleml.pipelines.__author__ = Elisha Yadgaran[source]
class simpleml.pipelines.Pipeline(has_external_files=True, transformers=None, fitted=False, dataset_id=None, **kwargs)[source]

Bases: simpleml.persistables.base_persistable.Persistable

Abstract Base class for all Pipelines objects.

Relies on mixin classes to define the split_dataset method. Will throw an error on use otherwise

params: pipeline parameter metadata for easy insight into hyperparameters across trainings

Parameters
  • has_external_files (bool) –

  • transformers (Optional[List[Any]]) –

  • fitted (bool) –

  • dataset_id (Optional[Union[str, uuid.uuid4]]) –

object_type :str = PIPELINE
X(self, split=None)

Get X for specific dataset split

Parameters

split (Optional[str]) –

Return type

Any

__post_restore__(self)

Extend main load routine to load relationship class

Return type

None

abstract _create_external_pipeline(self, *args, **kwargs)

each subclass should instantiate the respective pipeline library

_filter_fit_params(self, split)

Helper to filter unsupported fit params from dataset splits

Parameters

split (simpleml.pipelines.projected_splits.ProjectedDatasetSplit) –

Return type

Dict[str, Any]

_hash(self)
Hash is the combination of the:
  1. Dataset

  2. Transformers

  3. Transformer Params

  4. Pipeline Config

Return type

str

_load_dataset(self)

Helper to fetch the dataset

_transform(self, X, dataset_split=None)

Pass through method to external pipeline

Parameters
  • X (Any) – dataframe/matrix to transform, if None, use internal dataset

  • dataset_split (Optional[str]) –

Return type

Split object if no dataset passed (X is Null). Otherwise matrix return of input X

add_dataset(self, dataset)

Setter method for dataset used

Parameters

dataset (simpleml.datasets.base_dataset.Dataset) –

Return type

None

add_transformer(self, name, transformer)

Setter method for new transformer step

Parameters
  • name (str) –

  • transformer (Any) –

Return type

None

assert_dataset(self, msg='')

Helper method to raise an error if dataset isn’t present

Parameters

msg (str) –

Return type

None

assert_fitted(self, msg='')

Helper method to raise an error if pipeline isn’t fit

Parameters

msg (str) –

Return type

None

property dataset(self)

Use a weakref to bind linked dataset so it doesnt bloat usage returns dataset if still available or tries to fetch otherwise

property external_pipeline(self)

All pipeline objects are going to require some filebase persisted object

Wrapper around whatever underlying class is desired (eg sklearn or native)

Return type

Any

fit(self)

Pass through method to external pipeline

fit_transform(self, **kwargs)

Wrapper for fit and transform methods ASSUMES only applies to default (train) split

Return type

Any

property fitted(self)
Return type

bool

get_dataset_split(self, split=None)

Get specific dataset split Assumes a ProjectedDatasetSplit object (simpleml.pipelines.projected_splits.ProjectedDatasetSplit) is returned. Inherit or implement similar expected attributes to replace

Uses internal self._dataset_splits as the split container - assumes dictionary like itemgetter

Parameters

split (Optional[str]) –

Return type

simpleml.pipelines.projected_splits.ProjectedDatasetSplit

get_feature_names(self)

Pass through method to external pipeline Should return a list of the final features generated by this pipeline

Return type

List[str]

get_params(self, **kwargs)

Pass through method to external pipeline

get_split_names(self)
Return type

List[str]

get_transformers(self)

Pass through method to external pipeline

remove_transformer(self, name)

Delete method for transformer step

Parameters

name (str) –

Return type

None

save(self, **kwargs)

Extend parent function with a few additional save routines

  1. save params

  2. save transformer metadata

  3. features

Return type

None

set_params(self, **params)

Pass through method to external pipeline

split_dataset(self)

Method to create a cached reference to the projected data (cant use dataset directly in case of mutation concerns)

Non-split mixin class. Returns full dataset for any split name

Return type

None

transform(self, X, **kwargs)

Main transform routine - routes to generator or regular method depending on the flag

Parameters
  • return_generator – boolean, whether to use the transformation method

  • X (Any) –

Return type

Any

that returns a generator object or the regular transformed input :param return_sequence: boolean, whether to use method that returns a keras.utils.sequence object to play nice with keras models

y(self, split=None)

Get labels for specific dataset split

Parameters

split (Optional[str]) –

Return type

Any