simpleml.datasets.dask.base
Dask Module for datasets
Module Contents
Classes
Dask base class with control mechanism for self.dataframe of |
Attributes
- class simpleml.datasets.dask.base.BaseDaskDataset(squeeze_return=False, **kwargs)[source]
Bases:
simpleml.datasets.base_dataset.Dataset
Dask base class with control mechanism for self.dataframe of type dd.DataFrame
- Parameters
squeeze_return (bool) – boolean flag whether to run dataframe.squeeze() on return from self.get() calls. Particularly necessary to align input types with different libraries (e.g. sklearn y with single label)
- property X(self)[source]
Return the subset that isn’t in the target labels (across all potential splits)
- Return type
simpleml.imports.ddDataFrame
- property _dataframe(self)[source]
Overwrite base behavior to return a copy of the data in case consumers attempt to mutate the data structure
Only copies the container - underlying cell objects can still propagate inplace mutations (eg lists, dicts, objects)
- Return type
simpleml.imports.ddDataFrame
- static _get(dataframe, columns, split)[source]
Internal method to extract data subsets from a dataframe
- _validate_dtype(self, df)[source]
Validating setter method for self._external_file Checks input is of type dd.DataFrame
- Parameters
df (simpleml.imports.ddDataFrame) –
- Return type
None
- static concatenate_dataframes(dataframes, split_names)[source]
Helper method to merge dataframes into a single one with the split specified under DATAFRAME_SPLIT_COLUMN
- Parameters
dataframes (List[simpleml.imports.ddDataFrame]) –
split_names (List[str]) –
- Return type
simpleml.imports.ddDataFrame
- get(self, column, split)[source]
Explicitly split validation splits Uses self.label_columns to separate x and y columns inside the returned dataframe
returns empty dataframe for missing combinations of column & split
- get_feature_names(self)[source]
Should return a list of the features in the dataset
- Return type
List[str]
- get_split(self, split)[source]
Wrapper accessor to return a split object (for internal use)
- Parameters
split (Optional[str]) –
- Return type
simpleml.pipelines.validation_split_mixins.Split
- get_split_names(self)[source]
Helper to expose the splits contained in the dataset
- Return type
List[str]
- static merge_split(split)[source]
Helper method to merge all dataframes in a split object into a single df does a column-wise join ex: df1 = [A, B, C](4 rows) + df2 = [D, E, F](4 rows) returns: [A, B, C, D, E, F](4 rows)
- Parameters
split (simpleml.pipelines.validation_split_mixins.Split) –
- Return type
simpleml.imports.ddDataFrame