simpleml.save_patterns

Package for artifact persistence. Bindings are automatically included for SimpleML persistables, but patterns can be used for any objects or frameworks.

Patterns are loaded into global registry on import and more can be added externally by decorating

Patterns can be named anything since they are only mappings in the registry. Convention is -> Location : Serializer : Format(s)

  • Database Storage
    • database_table: Dataframe saving (as tables in dedicated schema)

    • database_pickled: In database as a binary blob

    • database_hdf5: In database as a binary blob

  • Local Filesystem Storage
    • disk_pickled: Pickled file on local disk

    • disk_hdf5: HDF5 file on local disk

    • disk_keras_hdf5: Keras formatted HDF5 file on local disk

  • Cloud Storage
    • cloud_pickled: Pickled file on cloud backend

    • cloud_hdf5: HDF5 file on cloud backend

    • cloud_keras_hdf5: Keras formatted HDF5 file on cloud backend

    Supported Backends:
    • Amazon S3

    • Google Cloud Platform

    • Microsoft Azure

    • Microsoft Onedrive

    • Aurora

    • Backblaze B2

    • DigitalOcean Spaces

    • OpenStack Swift

    Backend is determined by cloud_section in the configuration file

  • Remote filestore saving
    • SCP to remote server

Subpackages

Submodules

Package Contents

Classes

BaseSavePattern

Base class for save patterns (registered wrappers for the collection of

BaseSerializer

CloudpickleDiskSavePattern

Save pattern implementation to save objects to disk in pickled format

CloudpickleFileSerializer

CloudpickleLibcloudSavePattern

Save pattern implementation to save objects to disk in pickled format

DaskCSVSerializer

DaskDiskCSVSavePattern

Save pattern implementation to save dask objects to disk in csv format

DaskDiskJSONSavePattern

Save pattern implementation to save dask objects to disk in json format

DaskDiskParquetSavePattern

Save pattern implementation to save dask objects to disk in parquet format

DaskJSONSerializer

DaskLibcloudCSVSavePattern

Save pattern implementation to save dask objects to cloud via apached-libcloud in csv format

DaskLibcloudJSONSavePattern

Save pattern implementation to save dask objects to cloud via apached-libcloud in json format

DaskLibcloudParquetSavePattern

Save pattern implementation to save dask objects to cloud via apached-libcloud in parquet format

DaskParquetSerializer

FilestoreCopyFileLocation

FilestoreCopyFilesLocation

FilestoreCopyFolderLocation

FilestorePassthroughLocation

KerasDiskH5SavePattern

Save pattern implementation to save keras objects to disk in h5 format

KerasDiskSavedModelSavePattern

Save pattern implementation to save keras objects to disk in savedModel format

KerasH5Serializer

Uses Keras H5 serialization (legacy behavior)

KerasLibcloudH5SavePattern

Save pattern implementation to save keras objects to cloud via apached-libcloud in h5 format

KerasLibcloudSavedModelSavePattern

Save pattern implementation to save keras objects to cloud via apached-libcloud in savedModel format

KerasSavedModelSerializer

Uses Tensorflow SavedModel serialization

LibcloudCopyFileLocation

LibcloudCopyFilesLocation

Libcloud transport for many individual files

LibcloudCopyFolderLocation

Libcloud doesnt have a notion of folder objects so iterate through filepaths

PandasCSVSerializer

PandasDiskCSVSavePattern

Save pattern implementation to save pandas objects to disk in csv format

PandasDiskJSONSavePattern

Save pattern implementation to save pandas objects to disk in json format

PandasDiskParquetSavePattern

Save pattern implementation to save pandas objects to disk in parquet format

PandasJSONSerializer

PandasLibcloudCSVSavePattern

Save pattern implementation to save pandas objects to cloud via apached-libcloud in csv format

PandasLibcloudJSONSavePattern

Save pattern implementation to save pandas objects to cloud via apached-libcloud in json format

PandasLibcloudParquetSavePattern

Save pattern implementation to save pandas objects to cloud via apached-libcloud in parquet format

PandasParquetSerializer

SavePatternDecorators

Decorators that can be used for registering methods for loading

Functions

deregister_save_pattern(cls = None, save_pattern = None, save = True, load = True)

Deregister the class to use for saving and

register_save_pattern(cls, save_pattern = None, save = True, load = True, overwrite = False)

Register the class to use for saving and

Attributes

LOGGER

(Cloud)Pickle Save Patterns

PICKLE_DIRECTORY

__author__

simpleml.save_patterns.LOGGER[source]

(Cloud)Pickle Save Patterns

simpleml.save_patterns.PICKLE_DIRECTORY = pickle/[source]
simpleml.save_patterns.__author__ = Elisha Yadgaran[source]
class simpleml.save_patterns.BaseSavePattern[source]

Bases: object

Base class for save patterns (registered wrappers for the collection of serializers and deserializers)

deserializers :Tuple[Type[BaseSerializer]]
serializers :Tuple[Type[BaseSerializer]]
classmethod load(cls, **kwargs)

The load method invoked

Return type

Any

classmethod save(cls, **kwargs)

Routine to iterate through serializers returning the final metadata

Return type

Dict[str, str]

class simpleml.save_patterns.BaseSerializer[source]

Bases: object

abstract static deserialize(**kwargs)
Return type

Dict[str, Any]

abstract static serialize(**kwargs)
Return type

Dict[str, str]

class simpleml.save_patterns.CloudpickleDiskSavePattern[source]

Bases: base.BaseSavePattern

Save pattern implementation to save objects to disk in pickled format

SAVE_PATTERN = disk_pickled[source]
deserializers[source]
serializers[source]
classmethod load(cls, legacy=None, **kwargs)[source]

Catch for legacy filepath data to dynamically update to new convention

Parameters

legacy (Optional[str]) –

class simpleml.save_patterns.CloudpickleFileSerializer[source]

Bases: simpleml.save_patterns.base.BaseSerializer

static deserialize(filepath, source_directory='system_temp', **kwargs)
Parameters
  • filepath (str) –

  • source_directory (str) –

Return type

Dict[str, Any]

static serialize(obj, filepath, format_directory=PICKLE_DIRECTORY, format_extension='.pkl', destination_directory='system_temp', **kwargs)
Parameters
  • obj (Any) –

  • filepath (str) –

  • format_directory (str) –

  • format_extension (str) –

  • destination_directory (str) –

Return type

Dict[str, str]

class simpleml.save_patterns.CloudpickleLibcloudSavePattern[source]

Bases: base.BaseSavePattern

Save pattern implementation to save objects to disk in pickled format

SAVE_PATTERN = cloud_pickled[source]
deserializers[source]
serializers[source]
classmethod load(cls, legacy=None, **kwargs)[source]

Catch for legacy filepath data to dynamically update to new convention

Parameters

legacy (Optional[str]) –

class simpleml.save_patterns.DaskCSVSerializer[source]

Bases: simpleml.save_patterns.base.BaseSerializer

static deserialize(filepaths, source_directory='system_temp', **kwargs)
Parameters
  • filepaths (List[str]) –

  • source_directory (str) –

Return type

Dict[str, Any]

static serialize(obj, filepath, format_directory=CSV_DIRECTORY, format_extension='.csv', destination_directory='system_temp', **kwargs)
Parameters
  • obj (simpleml.imports.ddDataFrame) –

  • filepath (str) –

  • format_directory (str) –

  • format_extension (str) –

  • destination_directory (str) –

Return type

Dict[str, str]

class simpleml.save_patterns.DaskDiskCSVSavePattern[source]

Bases: base.BaseSavePattern

Save pattern implementation to save dask objects to disk in csv format

SAVE_PATTERN = dask_disk_csv[source]
deserializers[source]
serializers[source]
class simpleml.save_patterns.DaskDiskJSONSavePattern[source]

Bases: base.BaseSavePattern

Save pattern implementation to save dask objects to disk in json format

SAVE_PATTERN = dask_disk_json[source]
deserializers[source]
serializers[source]
class simpleml.save_patterns.DaskDiskParquetSavePattern[source]

Bases: base.BaseSavePattern

Save pattern implementation to save dask objects to disk in parquet format

SAVE_PATTERN = dask_disk_parquet[source]
deserializers[source]
serializers[source]
class simpleml.save_patterns.DaskJSONSerializer[source]

Bases: simpleml.save_patterns.base.BaseSerializer

static deserialize(filepaths, source_directory='system_temp', **kwargs)
Parameters
  • filepaths (List[str]) –

  • source_directory (str) –

Return type

Dict[str, Any]

static serialize(obj, filepath, format_directory=JSON_DIRECTORY, format_extension='.jsonl', destination_directory='system_temp', **kwargs)
Parameters
  • obj (simpleml.imports.ddDataFrame) –

  • filepath (str) –

  • format_directory (str) –

  • format_extension (str) –

  • destination_directory (str) –

Return type

Dict[str, str]

class simpleml.save_patterns.DaskLibcloudCSVSavePattern[source]

Bases: base.BaseSavePattern

Save pattern implementation to save dask objects to cloud via apached-libcloud in csv format

SAVE_PATTERN = dask_libcloud_csv[source]
deserializers[source]
serializers[source]
class simpleml.save_patterns.DaskLibcloudJSONSavePattern[source]

Bases: base.BaseSavePattern

Save pattern implementation to save dask objects to cloud via apached-libcloud in json format

SAVE_PATTERN = dask_libcloud_json[source]
deserializers[source]
serializers[source]
class simpleml.save_patterns.DaskLibcloudParquetSavePattern[source]

Bases: base.BaseSavePattern

Save pattern implementation to save dask objects to cloud via apached-libcloud in parquet format

SAVE_PATTERN = dask_libcloud_parquet[source]
deserializers[source]
serializers[source]
class simpleml.save_patterns.DaskParquetSerializer[source]

Bases: simpleml.save_patterns.base.BaseSerializer

static deserialize(filepath, source_directory='system_temp', **kwargs)
Parameters
  • filepath (str) –

  • source_directory (str) –

Return type

Dict[str, Any]

static serialize(obj, filepath, format_directory=PARQUET_DIRECTORY, format_extension='.parquet', destination_directory='system_temp', **kwargs)
Parameters
  • obj (simpleml.imports.ddDataFrame) –

  • filepath (str) –

  • format_directory (str) –

  • format_extension (str) –

  • destination_directory (str) –

Return type

Dict[str, str]

class simpleml.save_patterns.FilestoreCopyFileLocation[source]

Bases: simpleml.save_patterns.base.BaseSerializer

static deserialize(filepath, source_directory='filestore', destination_directory='system_temp', **kwargs)
Parameters
  • filepath (str) –

  • source_directory (str) –

  • destination_directory (str) –

Return type

Dict[str, str]

static serialize(filepath, source_directory='system_temp', destination_directory='filestore', **kwargs)
Parameters
  • filepath (str) –

  • source_directory (str) –

  • destination_directory (str) –

Return type

Dict[str, str]

class simpleml.save_patterns.FilestoreCopyFilesLocation[source]

Bases: simpleml.save_patterns.base.BaseSerializer

static deserialize(filepaths, source_directory='filestore', destination_directory='system_temp', **kwargs)
Parameters
  • filepaths (List[str]) –

  • source_directory (str) –

  • destination_directory (str) –

Return type

Dict[str, str]

static serialize(filepaths, source_directory='system_temp', destination_directory='filestore', **kwargs)
Parameters
  • filepaths (List[str]) –

  • source_directory (str) –

  • destination_directory (str) –

Return type

Dict[str, str]

class simpleml.save_patterns.FilestoreCopyFolderLocation[source]

Bases: simpleml.save_patterns.base.BaseSerializer

static deserialize(filepath, source_directory='filestore', destination_directory='system_temp', **kwargs)
Parameters
  • filepath (str) –

  • source_directory (str) –

  • destination_directory (str) –

Return type

Dict[str, str]

static serialize(filepath, source_directory='system_temp', destination_directory='filestore', **kwargs)
Parameters
  • filepath (str) –

  • source_directory (str) –

  • destination_directory (str) –

Return type

Dict[str, str]

class simpleml.save_patterns.FilestorePassthroughLocation[source]

Bases: simpleml.save_patterns.base.BaseSerializer

static deserialize(**kwargs)
Return type

Dict[str, str]

static serialize(**kwargs)
Return type

Dict[str, str]

class simpleml.save_patterns.KerasDiskH5SavePattern[source]

Bases: base.BaseSavePattern

Save pattern implementation to save keras objects to disk in h5 format

SAVE_PATTERN = keras_disk_h5[source]
deserializers[source]
serializers[source]
class simpleml.save_patterns.KerasDiskSavedModelSavePattern[source]

Bases: base.BaseSavePattern

Save pattern implementation to save keras objects to disk in savedModel format

SAVE_PATTERN = keras_disk_saved_model[source]
deserializers[source]
serializers[source]
class simpleml.save_patterns.KerasH5Serializer[source]

Bases: simpleml.save_patterns.base.BaseSerializer

Uses Keras H5 serialization (legacy behavior)

Output is a single file

static deserialize(filepath, source_directory='system_temp', **kwargs)
Parameters
  • filepath (str) –

  • source_directory (str) –

Return type

Dict[str, Any]

static serialize(obj, filepath, format_directory=HDF5_DIRECTORY, format_extension='.h5', destination_directory='system_temp', **kwargs)
Parameters
  • obj (Any) –

  • filepath (str) –

  • format_directory (str) –

  • format_extension (str) –

  • destination_directory (str) –

Return type

Dict[str, str]

class simpleml.save_patterns.KerasLibcloudH5SavePattern[source]

Bases: base.BaseSavePattern

Save pattern implementation to save keras objects to cloud via apached-libcloud in h5 format

SAVE_PATTERN = keras_libcloud_h5[source]
deserializers[source]
serializers[source]
class simpleml.save_patterns.KerasLibcloudSavedModelSavePattern[source]

Bases: base.BaseSavePattern

Save pattern implementation to save keras objects to cloud via apached-libcloud in savedModel format

SAVE_PATTERN = keras_libcloud_saved_model[source]
deserializers[source]
serializers[source]
class simpleml.save_patterns.KerasSavedModelSerializer[source]

Bases: simpleml.save_patterns.base.BaseSerializer

Uses Tensorflow SavedModel serialization

Output is a folder with assets keras_metadata.pb saved_model.pb variables

static deserialize(filepath, source_directory='system_temp', **kwargs)
Parameters
  • filepath (str) –

  • source_directory (str) –

Return type

Dict[str, Any]

static serialize(obj, filepath, format_directory=TENSORFLOW_SAVED_MODEL_DIRECTORY, format_extension='.savedModel', destination_directory='system_temp', **kwargs)
Parameters
  • obj (Any) –

  • filepath (str) –

  • format_directory (str) –

  • format_extension (str) –

  • destination_directory (str) –

Return type

Dict[str, str]

class simpleml.save_patterns.LibcloudCopyFileLocation[source]

Bases: simpleml.save_patterns.base.BaseSerializer

static deserialize(filepath, source_directory='libcloud_root_path', destination_directory='system_temp', **kwargs)
Parameters
  • filepath (str) –

  • source_directory (str) –

  • destination_directory (str) –

Return type

Dict[str, str]

static serialize(filepath, source_directory='system_temp', destination_directory='libcloud_root_path', **kwargs)
Parameters
  • filepath (str) –

  • source_directory (str) –

  • destination_directory (str) –

Return type

Dict[str, str]

class simpleml.save_patterns.LibcloudCopyFilesLocation[source]

Bases: simpleml.save_patterns.base.BaseSerializer

Libcloud transport for many individual files

classmethod deserialize(cls, filepaths, source_directory='libcloud_root_path', destination_directory='system_temp', **kwargs)
Parameters
  • filepaths (List[str]) –

  • source_directory (str) –

  • destination_directory (str) –

Return type

Dict[str, str]

static serialize(filepaths, source_directory='system_temp', destination_directory='libcloud_root_path', **kwargs)
Parameters
  • filepaths (List[str]) –

  • source_directory (str) –

  • destination_directory (str) –

Return type

Dict[str, str]

class simpleml.save_patterns.LibcloudCopyFolderLocation[source]

Bases: simpleml.save_patterns.base.BaseSerializer

Libcloud doesnt have a notion of folder objects so iterate through filepaths individually

static common_path(paths)

Helper utility to return the common parent path for a bunch of filepaths

Parameters

paths (List[str]) –

Return type

str

classmethod deserialize(cls, filepaths, source_directory='libcloud_root_path', destination_directory='system_temp', **kwargs)
Parameters
  • filepaths (List[str]) –

  • source_directory (str) –

  • destination_directory (str) –

Return type

Dict[str, str]

static serialize(filepath, source_directory='system_temp', destination_directory='libcloud_root_path', **kwargs)
Parameters
  • filepath (str) –

  • source_directory (str) –

  • destination_directory (str) –

Return type

Dict[str, str]

class simpleml.save_patterns.PandasCSVSerializer[source]

Bases: simpleml.save_patterns.base.BaseSerializer

static deserialize(filepath, source_directory='system_temp', **kwargs)
Parameters
  • filepath (str) –

  • source_directory (str) –

Return type

Dict[str, pandas.DataFrame]

static serialize(obj, filepath, format_directory=CSV_DIRECTORY, format_extension='.csv', destination_directory='system_temp', **kwargs)
Parameters
  • obj (pandas.DataFrame) –

  • filepath (str) –

  • format_directory (str) –

  • format_extension (str) –

  • destination_directory (str) –

Return type

Dict[str, str]

class simpleml.save_patterns.PandasDiskCSVSavePattern[source]

Bases: base.BaseSavePattern

Save pattern implementation to save pandas objects to disk in csv format

SAVE_PATTERN = pandas_disk_csv[source]
deserializers[source]
serializers[source]
class simpleml.save_patterns.PandasDiskJSONSavePattern[source]

Bases: base.BaseSavePattern

Save pattern implementation to save pandas objects to disk in json format

SAVE_PATTERN = pandas_disk_json[source]
deserializers[source]
serializers[source]
class simpleml.save_patterns.PandasDiskParquetSavePattern[source]

Bases: base.BaseSavePattern

Save pattern implementation to save pandas objects to disk in parquet format

SAVE_PATTERN = pandas_disk_parquet[source]
deserializers[source]
serializers[source]
class simpleml.save_patterns.PandasJSONSerializer[source]

Bases: simpleml.save_patterns.base.BaseSerializer

static deserialize(filepath, source_directory='system_temp', **kwargs)
Parameters
  • filepath (str) –

  • source_directory (str) –

Return type

Dict[str, pandas.DataFrame]

static serialize(obj, filepath, format_directory=JSON_DIRECTORY, format_extension='.jsonl', destination_directory='system_temp', **kwargs)
Parameters
  • obj (pandas.DataFrame) –

  • filepath (str) –

  • format_directory (str) –

  • format_extension (str) –

  • destination_directory (str) –

Return type

Dict[str, str]

class simpleml.save_patterns.PandasLibcloudCSVSavePattern[source]

Bases: base.BaseSavePattern

Save pattern implementation to save pandas objects to cloud via apached-libcloud in csv format

SAVE_PATTERN = pandas_libcloud_csv[source]
deserializers[source]
serializers[source]
class simpleml.save_patterns.PandasLibcloudJSONSavePattern[source]

Bases: base.BaseSavePattern

Save pattern implementation to save pandas objects to cloud via apached-libcloud in json format

SAVE_PATTERN = pandas_libcloud_json[source]
deserializers[source]
serializers[source]
class simpleml.save_patterns.PandasLibcloudParquetSavePattern[source]

Bases: base.BaseSavePattern

Save pattern implementation to save pandas objects to cloud via apached-libcloud in parquet format

SAVE_PATTERN = pandas_libcloud_parquet[source]
deserializers[source]
serializers[source]
class simpleml.save_patterns.PandasParquetSerializer[source]

Bases: simpleml.save_patterns.base.BaseSerializer

static deserialize(filepath, source_directory='system_temp', **kwargs)
Parameters
  • filepath (str) –

  • source_directory (str) –

Return type

Dict[str, pandas.DataFrame]

static serialize(obj, filepath, format_directory=PARQUET_DIRECTORY, format_extension='.parquet', destination_directory='system_temp', **kwargs)
Parameters
  • obj (pandas.DataFrame) –

  • filepath (str) –

  • format_directory (str) –

  • format_extension (str) –

  • destination_directory (str) –

Return type

Dict[str, str]

class simpleml.save_patterns.SavePatternDecorators[source]

Bases: object

Decorators that can be used for registering methods for loading and saving.

static deregister_save_pattern(cls_or_save_pattern=None, save=True, load=True)

Class level decorator to deregister allowed save patterns. Doesnt actually make use of the class but included for completeness. Recommended to use importable deregister_save_pattern function directly

Parameters
  • cls_or_save_pattern (Optional[str]) – the optional string or class denoting the pattern this class implements (e.g. disk_pickled). Checks class attribute cls.SAVE_PATTERN if null cls is automatically passed when calling decorator without parameters (@SavePatternDecorators.deregister_save_pattern)

  • save (Optional[bool]) – optional bool; default true; whether to drop the decorated class as the save method for the registered save pattern

  • load (Optional[bool]) – optional bool; default true; whether to drop the decorated class as the load method for the registered save pattern

Return type

Callable

static register_save_pattern(cls_or_save_pattern=None, save=True, load=True, overwrite=False)

Decorates a class to register the method(s) to use for saving and/or loading for the particular pattern

IT IS ALLOWABLE TO HAVE DIFFERENT CLASSES HANDLE SAVING AND LOADING FOR THE SAME REGISTERED PATTERN

Parameters
  • cls_or_save_pattern (Optional[Union[str, Type]]) – the optional string or class denoting the pattern this class implements (e.g. disk_pickled). Checks class attribute cls.SAVE_PATTERN if null cls is automatically passed when calling decorator without parameters (@SavePatternDecorators.register_save_pattern)

  • save (Optional[bool]) – optional bool; default true; whether to use the decorated class as the save method for the registered save pattern

  • load (Optional[bool]) – optional bool; default true; whether to use the decorated class as the load method for the registered save pattern

  • overwrite (Optional[bool]) – optional bool; default false; whether to overwrite the the registered class for the save pattern, if it exists. Otherwise throw an error

Return type

Callable

simpleml.save_patterns.deregister_save_pattern(cls=None, save_pattern=None, save=True, load=True)[source]

Deregister the class to use for saving and loading for the particular pattern

Parameters
  • save_pattern (Optional[str]) – the optional string denoting the pattern this class implements (e.g. disk_pickled). Checks class attribute cls.SAVE_PATTERN if null

  • save (Optional[bool]) – optional bool; default true; whether to remove the class as the save method for the registered save pattern

  • load (Optional[bool]) – optional bool; default true; whether to remove the class as the load method for the registered save pattern

  • cls (Optional[Type]) –

Return type

None

simpleml.save_patterns.register_save_pattern(cls, save_pattern=None, save=True, load=True, overwrite=False)[source]

Register the class to use for saving and loading for the particular pattern

IT IS ALLOWABLE TO HAVE DIFFERENT CLASSES HANDLE SAVING AND LOADING FOR THE SAME REGISTERED PATTERN

Parameters
  • save_pattern (Optional[str]) – the optional string denoting the pattern this class implements (e.g. disk_pickled). Checks class attribute cls.SAVE_PATTERN if null

  • save (Optional[bool]) – optional bool; default true; whether to use the decorated class as the save method for the registered save pattern

  • load (Optional[bool]) – optional bool; default true; whether to use the decorated class as the load method for the registered save pattern

  • overwrite (Optional[bool]) – optional bool; default false; whether to overwrite the the registered class for the save pattern, if it exists. Otherwise throw an error

  • cls (Type) –

Return type

None