simpleml.persistables.saving module

Module to define the mixins that support different persistence patterns for external objects Nomenclature -> Save Location : Save Format

  • Database Storage
    • database_table: Dataframe saving (as tables in dedicated schema)
    • database_pickled: In database as a binary blob
    • database_hdf5: In database as a binary blob
  • Local Filesystem Storage
    • disk_pickled: Pickled file on local disk
    • disk_hdf5: HDF5 file on local disk
    • disk_keras_hdf5: Keras formatted HDF5 file on local disk
  • Cloud Storage
    • cloud_pickled: Pickled file on cloud backend
    • cloud_hdf5: HDF5 file on cloud backend
    • cloud_keras_hdf5: Keras formatted HDF5 file on cloud backend
    Supported Backends:
    • Amazon S3
    • Google Cloud Platform
    • Microsoft Azure
    • Microsoft Onedrive
    • Aurora
    • Backblaze B2
    • DigitalOcean Spaces
    • OpenStack Swift

    Backend is determined by cloud_section in the configuration file

  • Remote filestore saving
    • SCP to remote server
class simpleml.persistables.saving.AllSaveMixin[source]

Bases: simpleml.persistables.saving.DatabaseTableSaveMixin, simpleml.persistables.saving.DatabasePickleSaveMixin, simpleml.persistables.saving.DiskPickleSaveMixin, simpleml.persistables.saving.DiskHDF5SaveMixin, simpleml.persistables.saving.KerasDiskHDF5SaveMixin, simpleml.persistables.saving.OnedrivePickleSaveMixin, simpleml.persistables.saving.OnedriveHDF5SaveMixin, simpleml.persistables.saving.OnedriveKerasHDF5SaveMixin, simpleml.persistables.saving.CloudPickleSaveMixin, simpleml.persistables.saving.CloudHDF5SaveMixin, simpleml.persistables.saving.CloudKerasHDF5SaveMixin

class simpleml.persistables.saving.CloudBase[source]

Bases: simpleml.persistables.saving.ExternalSaveMixin

Base class to save/load objects via Apache Libcloud

Generic api for all cloud providers so naming convention is extremely important to follow in the config. Please reference libcloud documentation for supported input parameters

``` [cloud] section = name of the config section to use, ex: s3

[s3] param = value –> normal key:value syntax. match these to however they are referenced later, examples: key = abc123 secret = superSecure region = us-east-1 something_specific_to_s3 = s3_parameter — Below are internally referenced SimpleML params — driver = S3 –> this must be the Apache Libcloud provider (https://github.com/apache/libcloud/blob/trunk/libcloud/storage/types.py) connection_params = key,secret,region,something_specific_to_s3 –> this determines the key: value params passed to the constructor (it can be different for each provider) path = simpleml/specific/root –> similar to disk based home directory, cloud home directory will start relative to here container = simpleml –> the cloud bucket or container name ```

How this gets used: ``` from libcloud.storage.types import Provider from libcloud.storage.providers import get_driver

cloud_section = CONFIG.get(CLOUD_SECTION, ‘section’) connection_params = CONFIG.getlist(cloud_section, ‘connection_params’) root_path = CONFIG.get(cloud_section, ‘path’, fallback=’‘)

driver_cls = get_driver(getattr(Provider, CONFIG.get(cloud_section, ‘driver’))) driver = driver_cls(**{param: CONFIG.get(cloud_section, param) for param in connection_params}) container = driver.get_container(container_name=CONFIG.get(cloud_section, ‘container’)) extra = {‘content_type’: ‘application/octet-stream’}

obj = driver.upload_object(LOCAL_FILE_PATH,
container=container, object_name=root_path + simpleml_folder_path + filename, extra=extra)
obj = driver.download_object(CLOUD_OBJECT,
destination_path=LOCAL_FILE_PATH, overwrite_existing=True, delete_on_failure=True)

```

download_from_cloud(folder, filename)[source]

Download any file from cloud to disk

driver
upload_to_cloud(folder, filename)[source]

Upload any file from disk to cloud

class simpleml.persistables.saving.CloudHDF5SaveMixin[source]

Bases: simpleml.persistables.saving.CloudBase

Mixin class to save objects to Cloud in HDF5 format

Expects the following available attributes:
  • self._external_file
  • self.id
Sets the following attributes:
  • self.filepaths
  • self.unloaded_externals
class simpleml.persistables.saving.CloudKerasHDF5SaveMixin[source]

Bases: simpleml.persistables.saving.CloudBase

Mixin class to save objects to Cloud in Keras HDF5 format

Expects the following available attributes:
  • self._external_file
  • self.id
Sets the following attributes:
  • self.filepaths
  • self.unloaded_externals
class simpleml.persistables.saving.CloudPickleSaveMixin[source]

Bases: simpleml.persistables.saving.CloudBase

Mixin class to save objects to Cloud in pickled format

Expects the following available attributes:
  • self._external_file
  • self.id
Sets the following attributes:
  • self.filepaths
  • self.unloaded_externals
class simpleml.persistables.saving.DatabasePickleSaveMixin[source]

Bases: simpleml.persistables.saving.ExternalSaveMixin

Mixin class to save binary objects to a database table

Expects the following available attributes:
  • self._external_file
  • self.id
  • self.object_type
Sets the following attributes:
  • self.filepaths
  • self.unloaded_externals
class simpleml.persistables.saving.DatabaseTableSaveMixin[source]

Bases: simpleml.persistables.saving.ExternalSaveMixin

Mixin class to save dataframes to a database table

Expects the following available attributes:
  • self._external_file
  • self.id
  • self.dataframe
Sets the following attributes:
  • self.filepaths
class simpleml.persistables.saving.DiskHDF5SaveMixin[source]

Bases: simpleml.persistables.saving.ExternalSaveMixin

Mixin class to save objects to disk in HDF5 format with hickle

Expects the following available attributes:
  • self._external_file
  • self.id
Sets the following attributes:
  • self.filepaths
  • self.unloaded_externals
class simpleml.persistables.saving.DiskPickleSaveMixin[source]

Bases: simpleml.persistables.saving.ExternalSaveMixin

Mixin class to save objects to disk in pickled format

Expects the following available attributes:
  • self._external_file
  • self.id
Sets the following attributes:
  • self.filepaths
  • self.unloaded_externals
class simpleml.persistables.saving.ExternalSaveMixin[source]

Bases: object

Base Class with save methods Subclasses should define the saving and loading patterns

static df_to_sql(engine, df, table, dtype=None, schema='public', if_exists='replace', sep='|', encoding='utf8', index=False)[source]

Utility to bulk insert pandas dataframe via copy from

Parameters:
  • df – dataframe to insert
  • table – destination table
  • dtype – column schema of destination table
  • schema – destination schema
  • if_exists – what to do if destination table exists; valid inputs are:

[replace, append, fail] :param sep: separator key between cells :param encoding: character encoding to use :param index: whether to output index with data

static hickle_object(obj, filepath)[source]

Serializes an object to the filesystem in HDF5 format.

Prepends path to SimpleML HDF5 directory before saving. ONLY pass in a relative filepath from that location

static load_hickled_object(filepath)[source]

Loads an object from the filesystem.

Prepends path to SimpleML HDF5 directory before loading. ONLY pass in a relative filepath from that location

static load_keras_object(filepath)[source]

Loads a Keras object from the filesystem.

Prepends path to SimpleML HDF5 directory before loading. ONLY pass in a relative filepath from that location

static load_pickled_object(filepath, stream=False)[source]

Loads an object from a serialized string or filesystem. When stream is True, it tries to load the file directly from the string.

Prepends path to SimpleML Pickle directory before loading. ONLY pass in a relative filepath from that location

static pickle_object(obj, filepath=None)[source]

Pickles an object to a string or to the filesystem. Assumes that a NULL filepath expects a serialized string returned

Prepends path to SimpleML Pickle directory before saving. ONLY pass in a relative filepath from that location

static save_keras_object(obj, filepath)[source]

Serializes an object to the filesystem in Keras HDF5 format.

Prepends path to SimpleML HDF5 directory before saving. ONLY pass in a relative filepath from that location

class simpleml.persistables.saving.KerasDiskHDF5SaveMixin[source]

Bases: simpleml.persistables.saving.ExternalSaveMixin

Mixin class to save objects to disk in Keras’s HDF5 format Keras’s internal persistence mechanism utilizes HDF5 and implements a custom pattern

Expects the following available attributes:
  • self._external_file
  • self.id
Sets the following attributes:
  • self.filepaths
  • self.unloaded_externals
class simpleml.persistables.saving.OnedriveBase[source]

Bases: simpleml.persistables.saving.ExternalSaveMixin

Base class to save/load objects to Microsoft Onedrive

authenticate_onedrive()[source]

Authenticate with Onedrive Oauth2

client
create_onedrive_schema(root_folder='SIMPLEML')[source]

Assumes already authenticated and assignment of self.client Checks if folders are already present, creates if not

download_from_onedrive(bucket, filename)[source]

Download any file from onedrive to disk

Steps:
  1. Authenticate
  2. Get Folder IDs
  3. Download
onedrive_filestore_id
onedrive_hdf5_id
onedrive_pickle_id
onedrive_root_id
upload_to_onedrive(bucket, filename)[source]

Upload any file from disk to onedrive

Steps:
  1. Authenticate
  2. Create Schema
  3. Upload
class simpleml.persistables.saving.OnedriveHDF5SaveMixin[source]

Bases: simpleml.persistables.saving.OnedriveBase

Mixin class to save objects to Microsoft Onedrive in HDF5 format

Expects the following available attributes:
  • self._external_file
  • self.id
Sets the following attributes:
  • self.filepaths
  • self.unloaded_externals
class simpleml.persistables.saving.OnedriveKerasHDF5SaveMixin[source]

Bases: simpleml.persistables.saving.OnedriveBase

Mixin class to save objects to Microsoft Onedrive in Keras HDF5 format

Expects the following available attributes:
  • self._external_file
  • self.id
Sets the following attributes:
  • self.filepaths
  • self.unloaded_externals
class simpleml.persistables.saving.OnedrivePickleSaveMixin[source]

Bases: simpleml.persistables.saving.OnedriveBase

Mixin class to save objects to Microsoft Onedrive in pickled format

Expects the following available attributes:
  • self._external_file
  • self.id
Sets the following attributes:
  • self.filepaths
  • self.unloaded_externals