simpleml.datasets.numpy.pipeline

Pipeline derived datasets

Module Contents

Classes

NumpyPipelineDataset

Dataset class with a predefined build

Attributes

__author__

simpleml.datasets.numpy.pipeline.__author__ = Elisha Yadgaran[source]
class simpleml.datasets.numpy.pipeline.NumpyPipelineDataset(*args, **kwargs)[source]

Bases: simpleml.datasets.numpy.base.BaseNumpyDataset

Dataset class with a predefined build routine, assuming dataset pipeline existence.

WARNING: this class will fail if build_dataframe is not overwritten or a pipeline provided!

param label_columns: Optional list of column names to register as the “y” split section param other_named_split_sections: Optional map of section names to lists of column names for

other arbitrary split columns – must match expected consumer signatures (e.g. sample_weights) because passed through untouched downstream (eg sklearn.fit(**split))

All other columns in the dataframe will automatically be referenced as “X”

build_dataframe(self)[source]

Transform raw dataset via dataset pipeline for production ready dataset Overwrite this method to disable raw dataset requirement

Return type

None