simpleml.pipelines.base_pipeline module¶
Base Module for Pipelines
-
class
simpleml.pipelines.base_pipeline.
AbstractPipeline
(has_external_files=True, transformers=[], external_pipeline_class='default', fitted=False, **kwargs)[source]¶ Bases:
simpleml.persistables.base_persistable.Persistable
,simpleml.persistables.saving.AllSaveMixin
Abstract Base class for all Pipelines objects.
Relies on mixin classes to define the split_dataset method. Will throw an error on use otherwise
params: pipeline parameter metadata for easy insight into hyperparameters across trainings
-
external_pipeline
¶ All pipeline objects are going to require some filebase persisted object
Wrapper around whatever underlying class is desired (eg sklearn or native)
-
fit_transform
(return_y=False, **kwargs)[source]¶ Wrapper for fit and transform methods ASSUMES only applies to train split
Parameters: return_y – whether to return y with output necessary for fitting a supervised model after
-
fitted
¶
-
get_dataset_split
(split=None)[source]¶ Get specific dataset split By default no constraint imposed, but convention is that return should be a tuple of (X, y)
-
get_feature_names
()[source]¶ Pass through method to external pipeline Should return a list of the final features generated by this pipeline
-
object_type
= 'PIPELINE'¶
-
params
= Column(None, JSONB(astext_type=Text()), table=None, default=ColumnDefault({}))¶
-
save
(**kwargs)[source]¶ Extend parent function with a few additional save routines
- save params
- save transformer metadata
- features
-
transform
(X, dataset_split=None, return_y=False, **kwargs)[source]¶ Pass through method to external pipeline
Parameters: - X – dataframe/matrix to transform, if None, use internal dataset
- return_y – whether to return y with output - only used if X is None necessary for fitting a supervised model after
-
-
class
simpleml.pipelines.base_pipeline.
GeneratorPipeline
(has_external_files=True, transformers=[], external_pipeline_class='default', fitted=False, **kwargs)[source]¶ Bases:
simpleml.pipelines.base_pipeline.Pipeline
Generator form of pipeline. Overwrites standard methods with ones that return generator objects
-
created_timestamp
¶
-
dataset
¶
-
dataset_id
¶
-
filepaths
¶
-
fit
(**kwargs)[source]¶ Pass through method to external pipeline Assumes underlying pipeline can make use of a generator to fit
-
get_dataset_split
(split=None, infinite_loop=False, batch_size=32, shuffle=True, **kwargs)[source]¶ Get specific dataset split
-
has_external_files
¶
-
hash_
¶
-
id
¶
-
metadata_
¶
-
modified_timestamp
¶
-
name
¶
-
params
¶
-
project
¶
-
registered_name
¶
-
transform
(X, dataset_split=None, return_y=False, **kwargs)[source]¶ Pass through method to external pipeline
Parameters: - X – dataframe/matrix to transform, if None, use internal dataset
- return_y – whether to return y with output - only used if X is None necessary for fitting a supervised model after
-
version
¶
-
version_description
¶
-
-
class
simpleml.pipelines.base_pipeline.
Pipeline
(has_external_files=True, transformers=[], external_pipeline_class='default', fitted=False, **kwargs)[source]¶ Bases:
simpleml.pipelines.base_pipeline.AbstractPipeline
Base class for all Pipeline objects.
dataset_id: foreign key relation to the dataset used as input
-
created_timestamp
¶
-
dataset
¶
-
dataset_id
¶
-
filepaths
¶
-
has_external_files
¶
-
hash_
¶
-
id
¶
-
metadata_
¶
-
modified_timestamp
¶
-
name
¶
-
params
¶
-
project
¶
-
registered_name
¶
-
version
¶
-
version_description
¶
-