simpleml.datasets.pandas_mixin module

Pandas Module for external dataframes

Inherit and extend for particular patterns. It is a bit of a misnomer to use the term “dataframe”, since there are very few expected attributes and they are by no means unique to pandas.

class simpleml.datasets.pandas_mixin.PandasDatasetMixin[source]

Bases: simpleml.datasets.abstract_mixin.AbstractDatasetMixin

“Pandas”esque mixin class with control mechanism for self.dataframe of type dataframe. Only assumes pandas syntax, not types, so should be compatible with pandas drop-in replacements.

In particular:
A - type of pd.DataFrame:
  • query()
  • columns
  • drop()
  • __getitem__()
  • squeeze()
B - any other type:
  • get()
  • __getitem__()
  • squeeze(
X

Return the subset that isn’t in the target labels (across all potential splits)

concatenate_dataframes(dataframes, split_names)[source]

Helper method to merge dataframes into a single one with the split specified under DATAFRAME_SPLIT_COLUMN

get(column, split)[source]

Explicitly split validation splits Assumes self.dataframe has a get method to return the dataframe associated with the split Uses self.label_columns to separate x and y columns inside the returned dataframe

returns empty dataframe for missing combinations of column & split

get_feature_names()[source]

Should return a list of the features in the dataset

static load_csv(filename, **kwargs)[source]

Helper method to read in a csv file

static load_sql(query, connection, **kwargs)[source]

Helper method to read in sql data

y

Return the target label columns