simpleml.datasets.pandas.base
Pandas Module for datasets
Inherit and extend for particular patterns
Module Contents
Classes
Pandas base class with control mechanism for self.dataframe of |
Attributes
- class simpleml.datasets.pandas.base.BasePandasDataset(squeeze_return=False, **kwargs)[source]
Bases:
simpleml.datasets.base_dataset.Dataset
Pandas base class with control mechanism for self.dataframe of type pd.Dataframe
- Parameters
squeeze_return (bool) – boolean flag whether to run dataframe.squeeze() on return from self.get() calls. Particularly necessary to align input types with different libraries (e.g. sklearn y with single label)
- property X(self)[source]
Return the subset that isn’t in the target labels (across all potential splits)
- Return type
pandas.DataFrame
- property _dataframe(self)[source]
Overwrite base behavior to return a copy of the data in case consumers attempt to mutate the data structure
Only copies the pandas container - underlying cell objects can still propagate inplace mutations (eg lists, dicts, objects)
- Return type
pandas.DataFrame
- static _get(dataframe, columns, split)[source]
Internal method to extract data subsets from a dataframe
- _validate_dtype(self, df)[source]
Validating setter method for self._external_file Checks input is of type pd.DataFrame
- Parameters
df (pandas.DataFrame) –
- Return type
None
- static concatenate_dataframes(dataframes, split_names)[source]
Helper method to merge dataframes into a single one with the split specified under DATAFRAME_SPLIT_COLUMN
- Parameters
dataframes (List[pandas.DataFrame]) –
split_names (List[str]) –
- Return type
pandas.DataFrame
- get(self, column, split)[source]
Explicitly split validation splits Uses self.label_columns to separate x and y columns inside the returned dataframe
returns empty dataframe for missing combinations of column & split
- get_feature_names(self)[source]
Should return a list of the features in the dataset
- Return type
List[str]
- get_split(self, split)[source]
Wrapper accessor to return a split object (for internal use)
- Parameters
split (Optional[str]) –
- Return type
simpleml.pipelines.validation_split_mixins.Split
- get_split_names(self)[source]
Helper to expose the splits contained in the dataset
- Return type
List[str]
- static merge_split(split)[source]
Helper method to merge all dataframes in a split object into a single df does a column-wise join ex: df1 = [A, B, C](4 rows) + df2 = [D, E, F](4 rows) returns: [A, B, C, D, E, F](4 rows)
- Parameters
split (simpleml.pipelines.validation_split_mixins.Split) –
- Return type
pandas.DataFrame