simpleml.datasets.numpy.base

Numpy Module for external “dataframe”

Inherit and extend for particular patterns. It is a bit of a misnomer to use the term “dataframe”, since there are very few expected attributes and they are by no means unique to pandas.

Module Contents

Classes

BaseNumpyDataset

Assumes self.dataframe is a dictionary of numpy ndarrays

Attributes

LOGGER

__author__

simpleml.datasets.numpy.base.LOGGER[source]
simpleml.datasets.numpy.base.__author__ = Elisha Yadgaran[source]
class simpleml.datasets.numpy.base.BaseNumpyDataset(*args, **kwargs)[source]

Bases: simpleml.datasets.base_dataset.Dataset

Assumes self.dataframe is a dictionary of numpy ndarrays

param label_columns: Optional list of column names to register as the “y” split section param other_named_split_sections: Optional map of section names to lists of column names for

other arbitrary split columns – must match expected consumer signatures (e.g. sample_weights) because passed through untouched downstream (eg sklearn.fit(**split))

All other columns in the dataframe will automatically be referenced as “X”

property X(self)[source]

Return the subset that isn’t in the target labels

Return type

numpy.ndarray

get(self, column, split)[source]

Explicitly split validation splits Assumes self.dataframe has a get method to return a dictionary of {‘X’: X, ‘y’: y} Uses self.label_columns if y is named something else – only looks at first entry in list

returns None for any combination of column/split that isn’t present

Parameters
  • column (str) –

  • split (str) –

Return type

numpy.ndarray

get_feature_names(self)[source]

Should return a list of the features in the dataset

Return type

List[str]

get_split_names(self)[source]

Helper to expose the splits contained in the dataset

Return type

List[str]

property y(self)[source]

Return the target label columns

Return type

numpy.ndarray