simpleml.pipelines.base_pipeline module

Base Module for Pipelines

class simpleml.pipelines.base_pipeline.AbstractPipeline(has_external_files=True, transformers=[], external_pipeline_class='default', fitted=False, **kwargs)[source]

Bases: simpleml.persistables.base_persistable.Persistable, simpleml.persistables.saving.AllSaveMixin

Abstract Base class for all Pipelines objects.

Relies on mixin classes to define the split_dataset method. Will throw an error on use otherwise

params: pipeline parameter metadata for easy insight into hyperparameters across trainings

add_dataset(dataset)[source]

Setter method for dataset used

add_transformer(name, transformer)[source]

Setter method for new transformer step

assert_dataset(msg='')[source]

Helper method to raise an error if dataset isn’t present

assert_fitted(msg='')[source]

Helper method to raise an error if pipeline isn’t fit

external_pipeline

All pipeline objects are going to require some filebase persisted object

Wrapper around whatever underlying class is desired (eg sklearn or native)

fit(**kwargs)[source]

Pass through method to external pipeline

fit_transform(return_y=False, **kwargs)[source]

Wrapper for fit and transform methods ASSUMES only applies to train split

Parameters:return_y – whether to return y with output necessary for fitting a supervised model after
fitted
get_dataset_split(split=None)[source]

Get specific dataset split By default no constraint imposed, but convention is that return should be a tuple of (X, y)

get_feature_names()[source]

Pass through method to external pipeline Should return a list of the final features generated by this pipeline

get_params(**kwargs)[source]

Pass through method to external pipeline

get_transformers()[source]

Pass through method to external pipeline

load(**kwargs)[source]

Extend main load routine to load relationship class

object_type = 'PIPELINE'
params = Column(None, JSONB(astext_type=Text()), table=None, default=ColumnDefault({}))
remove_transformer(name)[source]

Delete method for transformer step

save(**kwargs)[source]

Extend parent function with a few additional save routines

  1. save params
  2. save transformer metadata
  3. features
set_params(**params)[source]

Pass through method to external pipeline

transform(X, dataset_split=None, return_y=False, **kwargs)[source]

Pass through method to external pipeline

Parameters:
  • X – dataframe/matrix to transform, if None, use internal dataset
  • return_y – whether to return y with output - only used if X is None necessary for fitting a supervised model after
class simpleml.pipelines.base_pipeline.GeneratorPipeline(has_external_files=True, transformers=[], external_pipeline_class='default', fitted=False, **kwargs)[source]

Bases: simpleml.pipelines.base_pipeline.Pipeline

Generator form of pipeline. Overwrites standard methods with ones that return generator objects

author
created_timestamp
dataset
dataset_id
filepaths
fit(**kwargs)[source]

Pass through method to external pipeline Assumes underlying pipeline can make use of a generator to fit

get_dataset_split(split=None, infinite_loop=False, batch_size=32, shuffle=True, **kwargs)[source]

Get specific dataset split

has_external_files
hash_
id
metadata_
modified_timestamp
name
params
project
registered_name
transform(X, dataset_split=None, return_y=False, **kwargs)[source]

Pass through method to external pipeline

Parameters:
  • X – dataframe/matrix to transform, if None, use internal dataset
  • return_y – whether to return y with output - only used if X is None necessary for fitting a supervised model after
version
version_description
class simpleml.pipelines.base_pipeline.Pipeline(has_external_files=True, transformers=[], external_pipeline_class='default', fitted=False, **kwargs)[source]

Bases: simpleml.pipelines.base_pipeline.AbstractPipeline

Base class for all Pipeline objects.

dataset_id: foreign key relation to the dataset used as input

author
created_timestamp
dataset
dataset_id
filepaths
has_external_files
hash_
id
metadata_
modified_timestamp
name
params
project
registered_name
version
version_description