`simpleml.datasets.numpy`

Dataset Library support for Numpy

Submodules

Package Contents

Classes

`BaseNumpyDataset`	Assumes self.dataframe is a dictionary of numpy ndarrays
`NumpyPipelineDataset`	Dataset class with a predefined build

Attributes

__author__

simpleml.datasets.numpy.__author__ = Elisha Yadgaran[source]

class simpleml.datasets.numpy.BaseNumpyDataset(*args, **kwargs)[source]

Bases: simpleml.datasets.base_dataset.Dataset

Assumes self.dataframe is a dictionary of numpy ndarrays

param label_columns: Optional list of column names to register as the “y” split section param other_named_split_sections: Optional map of section names to lists of column names for

other arbitrary split columns – must match expected consumer signatures (e.g. sample_weights) because passed through untouched downstream (eg sklearn.fit(**split))

All other columns in the dataframe will automatically be referenced as “X”

property X(self)

Return the subset that isn’t in the target labels

Return type: numpy.ndarray

get(self, column, split)

Explicitly split validation splits Assumes self.dataframe has a get method to return a dictionary of {‘X’: X, ‘y’: y} Uses self.label_columns if y is named something else – only looks at first entry in list

returns None for any combination of column/split that isn’t present

Parameters

column (str) –
split (str) –

Return type

numpy.ndarray

get_feature_names(self)

Should return a list of the features in the dataset

Return type: List[str]

get_split_names(self)

Helper to expose the splits contained in the dataset

Return type: List[str]

property y(self)

Return the target label columns

Return type: numpy.ndarray

class simpleml.datasets.numpy.NumpyPipelineDataset(*args, **kwargs)[source]

Bases: simpleml.datasets.numpy.base.BaseNumpyDataset

Dataset class with a predefined build routine, assuming dataset pipeline existence.

WARNING: this class will fail if build_dataframe is not overwritten or a pipeline provided!