simpleml.pipelines.base_pipeline

Base Module for Pipelines

Module Contents

Classes

Pipeline

Abstract Base class for all Pipelines objects.

Attributes

LOGGER

__author__

simpleml.pipelines.base_pipeline.LOGGER[source]
simpleml.pipelines.base_pipeline.__author__ = Elisha Yadgaran[source]
class simpleml.pipelines.base_pipeline.Pipeline(has_external_files=True, transformers=None, fitted=False, dataset_id=None, **kwargs)[source]

Bases: simpleml.persistables.base_persistable.Persistable

Abstract Base class for all Pipelines objects.

Relies on mixin classes to define the split_dataset method. Will throw an error on use otherwise

params: pipeline parameter metadata for easy insight into hyperparameters across trainings

Parameters
  • has_external_files (bool) –

  • transformers (Optional[List[Any]]) –

  • fitted (bool) –

  • dataset_id (Optional[Union[str, uuid.uuid4]]) –

object_type :str = PIPELINE[source]
X(self, split=None)[source]

Get X for specific dataset split

Parameters

split (Optional[str]) –

Return type

Any

__post_restore__(self)[source]

Extend main load routine to load relationship class

Return type

None

abstract _create_external_pipeline(self, *args, **kwargs)[source]

each subclass should instantiate the respective pipeline library

_filter_fit_params(self, split)[source]

Helper to filter unsupported fit params from dataset splits

Parameters

split (simpleml.pipelines.projected_splits.ProjectedDatasetSplit) –

Return type

Dict[str, Any]

_hash(self)[source]
Hash is the combination of the:
  1. Dataset

  2. Transformers

  3. Transformer Params

  4. Pipeline Config

Return type

str

_load_dataset(self)[source]

Helper to fetch the dataset

_transform(self, X, dataset_split=None)[source]

Pass through method to external pipeline

Parameters
  • X (Any) – dataframe/matrix to transform, if None, use internal dataset

  • dataset_split (Optional[str]) –

Return type

Split object if no dataset passed (X is Null). Otherwise matrix return of input X

add_dataset(self, dataset)[source]

Setter method for dataset used

Parameters

dataset (simpleml.datasets.base_dataset.Dataset) –

Return type

None

add_transformer(self, name, transformer)[source]

Setter method for new transformer step

Parameters
  • name (str) –

  • transformer (Any) –

Return type

None

assert_dataset(self, msg='')[source]

Helper method to raise an error if dataset isn’t present

Parameters

msg (str) –

Return type

None

assert_fitted(self, msg='')[source]

Helper method to raise an error if pipeline isn’t fit

Parameters

msg (str) –

Return type

None

property dataset(self)[source]

Use a weakref to bind linked dataset so it doesnt bloat usage returns dataset if still available or tries to fetch otherwise

property external_pipeline(self)[source]

All pipeline objects are going to require some filebase persisted object

Wrapper around whatever underlying class is desired (eg sklearn or native)

Return type

Any

fit(self)[source]

Pass through method to external pipeline

fit_transform(self, **kwargs)[source]

Wrapper for fit and transform methods ASSUMES only applies to default (train) split

Return type

Any

property fitted(self)[source]
Return type

bool

get_dataset_split(self, split=None)[source]

Get specific dataset split Assumes a ProjectedDatasetSplit object (simpleml.pipelines.projected_splits.ProjectedDatasetSplit) is returned. Inherit or implement similar expected attributes to replace

Uses internal self._dataset_splits as the split container - assumes dictionary like itemgetter

Parameters

split (Optional[str]) –

Return type

simpleml.pipelines.projected_splits.ProjectedDatasetSplit

get_feature_names(self)[source]

Pass through method to external pipeline Should return a list of the final features generated by this pipeline

Return type

List[str]

get_params(self, **kwargs)[source]

Pass through method to external pipeline

get_split_names(self)[source]
Return type

List[str]

get_transformers(self)[source]

Pass through method to external pipeline

remove_transformer(self, name)[source]

Delete method for transformer step

Parameters

name (str) –

Return type

None

save(self, **kwargs)[source]

Extend parent function with a few additional save routines

  1. save params

  2. save transformer metadata

  3. features

Return type

None

set_params(self, **params)[source]

Pass through method to external pipeline

split_dataset(self)[source]

Method to create a cached reference to the projected data (cant use dataset directly in case of mutation concerns)

Non-split mixin class. Returns full dataset for any split name

Return type

None

transform(self, X, **kwargs)[source]

Main transform routine - routes to generator or regular method depending on the flag

Parameters
  • return_generator – boolean, whether to use the transformation method

  • X (Any) –

Return type

Any

that returns a generator object or the regular transformed input :param return_sequence: boolean, whether to use method that returns a keras.utils.sequence object to play nice with keras models

y(self, split=None)[source]

Get labels for specific dataset split

Parameters

split (Optional[str]) –

Return type

Any