simpleml.datasets.pandas_mixin

Pandas Module for external dataframes

Inherit and extend for particular patterns. It is a bit of a misnomer to use the term “dataframe”, since there are very few expected attributes and they are by no means unique to pandas.

Module Contents

Classes

PandasDatasetMixin

“Pandas”esque mixin class with control mechanism for self.dataframe of

simpleml.datasets.pandas_mixin.DATAFRAME_SPLIT_COLUMN = DATASET_SPLIT[source]
simpleml.datasets.pandas_mixin.__author__ = Elisha Yadgaran[source]
class simpleml.datasets.pandas_mixin.PandasDatasetMixin[source]

Bases: simpleml.datasets.abstract_mixin.AbstractDatasetMixin

“Pandas”esque mixin class with control mechanism for self.dataframe of type dataframe. Only assumes pandas syntax, not types, so should be compatible with pandas drop-in replacements.

In particular:
A - type of pd.DataFrame:
  • query()

  • columns

  • drop()

  • __getitem__()

  • squeeze()

B - any other type:
  • get()

  • __getitem__()

  • squeeze(

property X(self)[source]

Return the subset that isn’t in the target labels (across all potential splits)

concatenate_dataframes(self, dataframes, split_names)[source]

Helper method to merge dataframes into a single one with the split specified under DATAFRAME_SPLIT_COLUMN

get(self, column, split)[source]

Explicitly split validation splits Assumes self.dataframe has a get method to return the dataframe associated with the split Uses self.label_columns to separate x and y columns inside the returned dataframe

returns empty dataframe for missing combinations of column & split

get_feature_names(self)[source]

Should return a list of the features in the dataset

static load_csv(filename, **kwargs)[source]

Helper method to read in a csv file

property y(self)[source]

Return the target label columns