simpleml.persistables.saving module¶
Module to define the mixins that support different persistence patterns for external objects Nomenclature -> Save Location : Save Format
- Database Storage
- database_table: Dataframe saving (as tables in dedicated schema)
- database_pickled: In database as a binary blob
- database_hdf5: In database as a binary blob
- Local Filesystem Storage
- disk_pickled: Pickled file on local disk
- disk_hdf5: HDF5 file on local disk
- disk_keras_hdf5: Keras formatted HDF5 file on local disk
- Cloud Storage
- cloud_pickled: Pickled file on cloud backend
- cloud_hdf5: HDF5 file on cloud backend
- cloud_keras_hdf5: Keras formatted HDF5 file on cloud backend
- Supported Backends:
- Amazon S3
- Google Cloud Platform
- Microsoft Azure
- Microsoft Onedrive
- Aurora
- Backblaze B2
- DigitalOcean Spaces
- OpenStack Swift
Backend is determined by cloud_section in the configuration file
- Remote filestore saving
- SCP to remote server
-
class
simpleml.persistables.saving.
AllSaveMixin
[source]¶ Bases:
simpleml.persistables.saving.DatabaseTableSaveMixin
,simpleml.persistables.saving.DatabasePickleSaveMixin
,simpleml.persistables.saving.DiskPickleSaveMixin
,simpleml.persistables.saving.DiskHDF5SaveMixin
,simpleml.persistables.saving.KerasDiskHDF5SaveMixin
,simpleml.persistables.saving.OnedrivePickleSaveMixin
,simpleml.persistables.saving.OnedriveHDF5SaveMixin
,simpleml.persistables.saving.OnedriveKerasHDF5SaveMixin
,simpleml.persistables.saving.CloudPickleSaveMixin
,simpleml.persistables.saving.CloudHDF5SaveMixin
,simpleml.persistables.saving.CloudKerasHDF5SaveMixin
-
class
simpleml.persistables.saving.
CloudBase
[source]¶ Bases:
simpleml.persistables.saving.ExternalSaveMixin
Base class to save/load objects via Apache Libcloud
Generic api for all cloud providers so naming convention is extremely important to follow in the config. Please reference libcloud documentation for supported input parameters
``` [cloud] section = name of the config section to use, ex: s3
[s3] param = value –> normal key:value syntax. match these to however they are referenced later, examples: key = abc123 secret = superSecure region = us-east-1 something_specific_to_s3 = s3_parameter — Below are internally referenced SimpleML params — driver = S3 –> this must be the Apache Libcloud provider (https://github.com/apache/libcloud/blob/trunk/libcloud/storage/types.py) connection_params = key,secret,region,something_specific_to_s3 –> this determines the key: value params passed to the constructor (it can be different for each provider) path = simpleml/specific/root –> similar to disk based home directory, cloud home directory will start relative to here container = simpleml –> the cloud bucket or container name ```
How this gets used: ``` from libcloud.storage.types import Provider from libcloud.storage.providers import get_driver
cloud_section = CONFIG.get(CLOUD_SECTION, ‘section’) connection_params = CONFIG.getlist(cloud_section, ‘connection_params’) root_path = CONFIG.get(cloud_section, ‘path’, fallback=’‘)
driver_cls = get_driver(getattr(Provider, CONFIG.get(cloud_section, ‘driver’))) driver = driver_cls(**{param: CONFIG.get(cloud_section, param) for param in connection_params}) container = driver.get_container(container_name=CONFIG.get(cloud_section, ‘container’)) extra = {‘content_type’: ‘application/octet-stream’}
- obj = driver.upload_object(LOCAL_FILE_PATH,
- container=container, object_name=root_path + simpleml_folder_path + filename, extra=extra)
- obj = driver.download_object(CLOUD_OBJECT,
- destination_path=LOCAL_FILE_PATH, overwrite_existing=True, delete_on_failure=True)
-
driver
¶
-
class
simpleml.persistables.saving.
CloudHDF5SaveMixin
[source]¶ Bases:
simpleml.persistables.saving.CloudBase
Mixin class to save objects to Cloud in HDF5 format
- Expects the following available attributes:
- self._external_file
- self.id
- Sets the following attributes:
- self.filepaths
- self.unloaded_externals
-
class
simpleml.persistables.saving.
CloudKerasHDF5SaveMixin
[source]¶ Bases:
simpleml.persistables.saving.CloudBase
Mixin class to save objects to Cloud in Keras HDF5 format
- Expects the following available attributes:
- self._external_file
- self.id
- Sets the following attributes:
- self.filepaths
- self.unloaded_externals
-
class
simpleml.persistables.saving.
CloudPickleSaveMixin
[source]¶ Bases:
simpleml.persistables.saving.CloudBase
Mixin class to save objects to Cloud in pickled format
- Expects the following available attributes:
- self._external_file
- self.id
- Sets the following attributes:
- self.filepaths
- self.unloaded_externals
-
class
simpleml.persistables.saving.
DatabasePickleSaveMixin
[source]¶ Bases:
simpleml.persistables.saving.ExternalSaveMixin
Mixin class to save binary objects to a database table
- Expects the following available attributes:
- self._external_file
- self.id
- self.object_type
- Sets the following attributes:
- self.filepaths
- self.unloaded_externals
-
class
simpleml.persistables.saving.
DatabaseTableSaveMixin
[source]¶ Bases:
simpleml.persistables.saving.ExternalSaveMixin
Mixin class to save dataframes to a database table
- Expects the following available attributes:
- self._external_file
- self.id
- self.dataframe
- Sets the following attributes:
- self.filepaths
-
class
simpleml.persistables.saving.
DiskHDF5SaveMixin
[source]¶ Bases:
simpleml.persistables.saving.ExternalSaveMixin
Mixin class to save objects to disk in HDF5 format with hickle
- Expects the following available attributes:
- self._external_file
- self.id
- Sets the following attributes:
- self.filepaths
- self.unloaded_externals
-
class
simpleml.persistables.saving.
DiskPickleSaveMixin
[source]¶ Bases:
simpleml.persistables.saving.ExternalSaveMixin
Mixin class to save objects to disk in pickled format
- Expects the following available attributes:
- self._external_file
- self.id
- Sets the following attributes:
- self.filepaths
- self.unloaded_externals
-
class
simpleml.persistables.saving.
ExternalSaveMixin
[source]¶ Bases:
object
Base Class with save methods Subclasses should define the saving and loading patterns
-
static
df_to_sql
(engine, df, table, dtype=None, schema='public', if_exists='replace', sep='|', encoding='utf8', index=False)[source]¶ Utility to bulk insert pandas dataframe via copy from
Parameters: - df – dataframe to insert
- table – destination table
- dtype – column schema of destination table
- schema – destination schema
- if_exists – what to do if destination table exists; valid inputs are:
[replace, append, fail] :param sep: separator key between cells :param encoding: character encoding to use :param index: whether to output index with data
-
static
hickle_object
(obj, filepath)[source]¶ Serializes an object to the filesystem in HDF5 format.
Prepends path to SimpleML HDF5 directory before saving. ONLY pass in a relative filepath from that location
-
static
load_hickled_object
(filepath)[source]¶ Loads an object from the filesystem.
Prepends path to SimpleML HDF5 directory before loading. ONLY pass in a relative filepath from that location
-
static
load_keras_object
(filepath)[source]¶ Loads a Keras object from the filesystem.
Prepends path to SimpleML HDF5 directory before loading. ONLY pass in a relative filepath from that location
-
static
load_pickled_object
(filepath, stream=False)[source]¶ Loads an object from a serialized string or filesystem. When stream is True, it tries to load the file directly from the string.
Prepends path to SimpleML Pickle directory before loading. ONLY pass in a relative filepath from that location
-
static
-
class
simpleml.persistables.saving.
KerasDiskHDF5SaveMixin
[source]¶ Bases:
simpleml.persistables.saving.ExternalSaveMixin
Mixin class to save objects to disk in Keras’s HDF5 format Keras’s internal persistence mechanism utilizes HDF5 and implements a custom pattern
- Expects the following available attributes:
- self._external_file
- self.id
- Sets the following attributes:
- self.filepaths
- self.unloaded_externals
-
class
simpleml.persistables.saving.
OnedriveBase
[source]¶ Bases:
simpleml.persistables.saving.ExternalSaveMixin
Base class to save/load objects to Microsoft Onedrive
-
client
¶
-
create_onedrive_schema
(root_folder='SIMPLEML')[source]¶ Assumes already authenticated and assignment of self.client Checks if folders are already present, creates if not
-
download_from_onedrive
(bucket, filename)[source]¶ Download any file from onedrive to disk
- Steps:
- Authenticate
- Get Folder IDs
- Download
-
onedrive_filestore_id
¶
-
onedrive_hdf5_id
¶
-
onedrive_pickle_id
¶
-
onedrive_root_id
¶
-
-
class
simpleml.persistables.saving.
OnedriveHDF5SaveMixin
[source]¶ Bases:
simpleml.persistables.saving.OnedriveBase
Mixin class to save objects to Microsoft Onedrive in HDF5 format
- Expects the following available attributes:
- self._external_file
- self.id
- Sets the following attributes:
- self.filepaths
- self.unloaded_externals
-
class
simpleml.persistables.saving.
OnedriveKerasHDF5SaveMixin
[source]¶ Bases:
simpleml.persistables.saving.OnedriveBase
Mixin class to save objects to Microsoft Onedrive in Keras HDF5 format
- Expects the following available attributes:
- self._external_file
- self.id
- Sets the following attributes:
- self.filepaths
- self.unloaded_externals
-
class
simpleml.persistables.saving.
OnedrivePickleSaveMixin
[source]¶ Bases:
simpleml.persistables.saving.OnedriveBase
Mixin class to save objects to Microsoft Onedrive in pickled format
- Expects the following available attributes:
- self._external_file
- self.id
- Sets the following attributes:
- self.filepaths
- self.unloaded_externals