simpleml.pipelines.base_pipeline
Base Module for Pipelines
Module Contents
Classes
Abstract Base class for all Pipelines objects. |
Attributes
- class simpleml.pipelines.base_pipeline.Pipeline(has_external_files=True, transformers=None, fitted=False, dataset_id=None, **kwargs)[source]
Bases:
simpleml.persistables.base_persistable.Persistable
Abstract Base class for all Pipelines objects.
Relies on mixin classes to define the split_dataset method. Will throw an error on use otherwise
params: pipeline parameter metadata for easy insight into hyperparameters across trainings
- Parameters
- X(self, split=None)[source]
Get X for specific dataset split
- Parameters
split (Optional[str]) –
- Return type
Any
- __post_restore__(self)[source]
Extend main load routine to load relationship class
- Return type
None
- abstract _create_external_pipeline(self, *args, **kwargs)[source]
each subclass should instantiate the respective pipeline library
- _filter_fit_params(self, split)[source]
Helper to filter unsupported fit params from dataset splits
- Parameters
split (simpleml.pipelines.projected_splits.ProjectedDatasetSplit) –
- Return type
Dict[str, Any]
- _hash(self)[source]
- Hash is the combination of the:
Dataset
Transformers
Transformer Params
Pipeline Config
- Return type
- _transform(self, X, dataset_split=None)[source]
Pass through method to external pipeline
- Parameters
X (Any) – dataframe/matrix to transform, if None, use internal dataset
dataset_split (Optional[str]) –
- Return type
Split object if no dataset passed (X is Null). Otherwise matrix return of input X
- add_dataset(self, dataset)[source]
Setter method for dataset used
- Parameters
dataset (simpleml.datasets.base_dataset.Dataset) –
- Return type
None
- add_transformer(self, name, transformer)[source]
Setter method for new transformer step
- Parameters
name (str) –
transformer (Any) –
- Return type
None
- assert_dataset(self, msg='')[source]
Helper method to raise an error if dataset isn’t present
- Parameters
msg (str) –
- Return type
None
- assert_fitted(self, msg='')[source]
Helper method to raise an error if pipeline isn’t fit
- Parameters
msg (str) –
- Return type
None
- property dataset(self)[source]
Use a weakref to bind linked dataset so it doesnt bloat usage returns dataset if still available or tries to fetch otherwise
- property external_pipeline(self)[source]
All pipeline objects are going to require some filebase persisted object
Wrapper around whatever underlying class is desired (eg sklearn or native)
- Return type
Any
- fit_transform(self, **kwargs)[source]
Wrapper for fit and transform methods ASSUMES only applies to default (train) split
- Return type
Any
- get_dataset_split(self, split=None)[source]
Get specific dataset split Assumes a ProjectedDatasetSplit object (simpleml.pipelines.projected_splits.ProjectedDatasetSplit) is returned. Inherit or implement similar expected attributes to replace
Uses internal self._dataset_splits as the split container - assumes dictionary like itemgetter
- Parameters
split (Optional[str]) –
- Return type
- get_feature_names(self)[source]
Pass through method to external pipeline Should return a list of the final features generated by this pipeline
- Return type
List[str]
- remove_transformer(self, name)[source]
Delete method for transformer step
- Parameters
name (str) –
- Return type
None
- save(self, **kwargs)[source]
Extend parent function with a few additional save routines
save params
save transformer metadata
features
- Return type
None
- split_dataset(self)[source]
Method to create a cached reference to the projected data (cant use dataset directly in case of mutation concerns)
Non-split mixin class. Returns full dataset for any split name
- Return type
None
- transform(self, X, **kwargs)[source]
Main transform routine - routes to generator or regular method depending on the flag
- Parameters
return_generator – boolean, whether to use the transformation method
X (Any) –
- Return type
Any
that returns a generator object or the regular transformed input :param return_sequence: boolean, whether to use method that returns a keras.utils.sequence object to play nice with keras models