simpleml.pipelines.base_pipeline module

class simpleml.pipelines.base_pipeline.BasePipeline(has_external_files=True, transformers=[], **kwargs)[source]

Bases: simpleml.persistables.base_persistable.BasePersistable, simpleml.persistables.saving.AllSaveMixin

Base class for all Pipelines objects.

params: pipeline parameter metadata for easy insight into hyperparameters across trainings

add_dataset(dataset)[source]

Setter method for dataset used

add_transformer(name, transformer)[source]

Setter method for new transformer step

external_pipeline

All pipeline objects are going to require some filebase persisted object

Wrapper around whatever underlying class is desired (eg sklearn or native)

fit(**kwargs)[source]

Pass through method to external pipeline

fit_transform(return_y=False, **kwargs)[source]

Wrapper for fit and transform methods ASSUMES only applies to train split

Parameters:return_y – whether to return y with output necessary for fitting a supervised model after
get_dataset_split(split=None)[source]

Get specific dataset split By default no constraint imposed, but convention is that return should be a tuple of (X, y)

get_feature_names()[source]

Pass through method to external pipeline Should return a list of the final features generated by this pipeline

get_params(**kwargs)[source]

Pass through method to external pipeline

get_transformers()[source]

Pass through method to external pipeline

load(**kwargs)[source]

Extend main load routine to load relationship class

params = Column(None, JSONB(astext_type=Text()), table=None, default=ColumnDefault({}))
remove_transformer(name)[source]

Delete method for transformer step

save(**kwargs)[source]

Extend parent function with a few additional save routines

  1. save params
  2. save transformer metadata
  3. features
set_params(**params)[source]

Pass through method to external pipeline

transform(X, dataset_split=None, return_y=False, **kwargs)[source]

Pass through method to external pipeline

Parameters:
  • X – dataframe/matrix to transform, if None, use internal dataset
  • return_y – whether to return y with output - only used if X is None necessary for fitting a supervised model after