simpleml.persistables.base_persistable

Base class for all database tracked records, called “Persistables”

Module Contents

Classes

Persistable

Base class for all SimpleML persistable objects.

Attributes

LOGGER

__author__

simpleml.persistables.base_persistable.LOGGER[source]
simpleml.persistables.base_persistable.__author__ = Elisha Yadgaran[source]
class simpleml.persistables.base_persistable.Persistable(id=None, hash_=None, name='default', has_external_files=False, author='default', project='default', version=None, version_description='', save_patterns=None, filepaths=None, metadata_=None, **kwargs)[source]

Bases: simpleml.persistables.hashing.CustomHasherMixin

Base class for all SimpleML persistable objects.

Uses private class attributes for internal artifact registry Does not need to be persisted because it gets populated on import (and can therefore be changed between versions) cls._ARTIFACT_{artifact_name} = {‘save’: save_attribute, ‘restore’: restore_attribute}

id: Random UUID(4). Used over auto incrementing id to minimize collision probability

with distributed trainings and authors (especially if using central server to combine results across different instantiations of SimpleML)

hash_id: Use hash of object to uniquely identify the contents at train time registered_name: class name of object defined when importing

Can be used for the drag and drop GUI - also for prescribing training config

author: creator project: Project objects are associated with. Useful if multiple persistables

relate to the same project and want to be grouped (but have different names) also good for implementing row based security across teams

name: friendly name - primary way of tracking evolution of “same” object over time version: autoincrementing id of “friendly name” version_description: description that explains what is new or different about this version

# Persistence of fitted states has_external_files = boolean field to signify presence of saved files not in (main) db filepaths = JSON object with external file details

The nested notation is because any persistable can implement multiple save options (with arbitrary priority) and arbitrary inputs. Simple serialization could have only a single string location whereas complex artifacts might have a list or map of filepaths

Structure: {

artifact_name: {

‘save_pattern’: filepath_data

}, “example”: {

“disk_pickled”: path to file, relative to base simpleml folder (default ~/.simpleml), “database”: {“schema”: schema, “table”: table_name}, # (for files extractable with select * from) …

}

}

metadata: Generic JSON store for random attributes

Parameters
  • id (uuid.UUID) –

  • hash_ (str) –

  • name (Optional[str]) –

  • has_external_files (bool) –

  • author (Optional[str]) –

  • project (Optional[str]) –

  • version (Optional[int]) –

  • version_description (Optional[str]) –

  • save_patterns (Optional[Dict[str, List[str]]]) –

  • filepaths (Optional[Dict]) –

  • metadata_ (Optional[Dict]) –

object_type = PERSISTABLE[source]
__post_init__(self)[source]
__post_restore__(self)[source]
_configure_unmapped_attributes(self)[source]

Unified entry for unmapped attributes. need to be restored when loading classes

_get_latest_version(self)[source]

Versions should be autoincrementing for each object (constrained over friendly name). Executes a database lookup and increments..

Return type

int

abstract _hash(self)[source]

Each subclass should implement a hashing routine to uniquely AND consistently identify the object contents. Consistency is important to ensure ability to assert identity across code definitions

property config(self)[source]
Return type

Dict[str, Any]

classmethod from_dict(cls, **kwargs)[source]

Parameterize a persistable from a dict. Used in deserialization from ORM objects

Return type

Persistable

get_artifact(self, artifact_name)[source]

Accessor method to lookup the artifact in the registry and return the corresponding data value

Parameters

artifact_name (str) –

Return type

Any

property library_versions(self)[source]
Return type

Dict[str, str]

load_external_file(self, artifact_name, save_pattern, cls=None)[source]

Define pattern for loading external files returns the object for assignment Inverted operation from saving. Registered functions should take in the same data (in the same form) of what is saved in the filepath

Parameters
  • artifact_name (str) –

  • save_pattern (str) –

  • cls (Optional[Type]) –

Return type

Any

load_external_files(self, artifact_name=None)[source]

Main routine to restore registered external artifacts. Will iterate through save patterns and break after the first successful restore (allows robustness in the event of unavailable resources)

Parameters

artifact_name (Optional[str]) –

Return type

None

load_if_unloaded(self, artifact_name)[source]

Convenience method to load an artifact if not already loaded. Easy dropin in property methods ``` @property def artifact(self):

self.load_if_unloaded(artifact_name) if not hasattr(self, artifact_attribute):

self.create_artifact()

return self.artifact_attribute

```

Parameters

artifact_name (str) –

Return type

None

property orm_cls(self)[source]
restore_artifact(self, artifact_name, obj)[source]

Setter method to lookup the restore attribute and set to the passed object

Parameters
  • artifact_name (str) –

  • obj (Any) –

Return type

None

save(self)[source]

Each subclass needs to instantiate a save routine to persist to the database and any other required filestore

sqlalchemy_mixins supports active record style TableModel.save() so can still call super(Persistable, self).save()

Return type

None

save_external_file(self, artifact_name, save_pattern, cls=None, **save_params)[source]

Abstracted pattern to save an artifact via one of the registered patterns and update the filepaths location

Parameters
  • artifact_name (str) –

  • save_pattern (str) –

  • cls (Optional[Type]) –

Return type

None

save_external_files(self)[source]

Main routine to save registered external artifacts. Each save pattern is defined using the standard api for the save params defined here. If a pattern requires more imports, it needs to be added here

Uses a standardized nomenclature to reuse params regardless of save pattern {

‘persistable_id’: the database id of the persistable. typically used as the root name of the saved object. implementations will pre/suffix, ‘persistable_type’: the persistable type (DATASET/PIPELINE..), ‘overwrite’: boolean. shortcut in case save pattern redefines a serialization routine

}

Return type

None

property state(self)[source]
Return type

Dict[str, Any]

to_dict(self)[source]