simpleml.orm

ORM package

Focuses on all database related interaction, intentionally separated to allow parallel Persistable objects to only deal with the glue interactions across python types and libraries

Each mapped Persistable table model has a 1:1 parallel class defined as a native python object

Submodules

Package Contents

Classes

AlembicDatabase

Base database class to manage dbs with schema tracking. Includes alembic

BaseDatabase

Base Database class to configure db connection

BinaryStorageDatabase

Hardcoded database mapped to binary storage metadata

Database

SimpleML specific configuration to interact with the database

DatasetDatabase

Hardcoded database mapped to dataset storage metadata

ORMDataset

Base class for all Dataset objects.

ORMMetric

Abstract Base class for all Metric objects

ORMModel

Base class for all Model objects. Defines the required

ORMPipeline

Base class for all Pipeline objects.

Attributes

ORM_REGISTRY

__author__

simpleml.orm.ORM_REGISTRY[source]
simpleml.orm.__author__ = Elisha Yadgaran[source]
class simpleml.orm.AlembicDatabase(alembic_filepath, script_location='migrations', *args, **kwargs)[source]

Bases: BaseDatabase

Base database class to manage dbs with schema tracking. Includes alembic config references

Parameters
  • use_ssh_tunnel – boolean - default false. Whether to tunnel sqlalchemy connection through an ssh tunnel or not

  • sshtunnel_params – Dict of ssh params - specify according to sshtunnel project https://github.com/pahaz/sshtunnel/ - direct passthrough

property alembic_config(self)
create_tables(self, base, drop_tables=False, ignore_errors=False)

Creates database tables (and potentially drops existing ones). Assumes to be running under a sufficiently privileged user

Parameters

drop_tables – Whether or not to drop the existing tables first.

Returns

None

downgrade(self, revision)

Proxy Method to invoke alembic downgrade command to specified revision Indirectly runs the alembic env.py code

initialize(self, base_list, upgrade=False, validate=True, **kwargs)

Initialization method to set up database connection and inject session manager

Raises a SimpleML error if database schema is not up to date

Parameters
  • drop_tables – Bool, whether to drop existing tables in database

  • upgrade – Bool, whether to run an upgrade migration after establishing a connection

Returns

None

upgrade(self, revision='head')

Proxy Method to invoke alembic upgrade command to specified revision Indirectly runs the alembic env.py code

validate_schema_version(self, base_list)

Check that the newly initialized database is up-to-date Raises an error otherwise (ahead of any table model mismatches later)

class simpleml.orm.BaseDatabase(config=None, configuration_section=None, uri=None, use_ssh_tunnel=False, sshtunnel_params=None, **credentials)[source]

Bases: object

Base Database class to configure db connection Does not assume schema tracking or any other validation

Starting in sqlalchemy 1.4.2, the signature of sqlalchemy.engine.url.URL has changed to an immutable object without an __init__

Parameters
  • use_ssh_tunnel (bool) – boolean - default false. Whether to tunnel sqlalchemy connection through an ssh tunnel or not

  • sshtunnel_params (Optional[Dict[str, Any]]) – Dict of ssh params - specify according to sshtunnel project https://github.com/pahaz/sshtunnel/ - direct passthrough

  • config (Optional[Dict[str, Any]]) –

  • configuration_section (Optional[str]) –

  • uri (Optional[str]) –

__repr__(self)

Return repr(self).

__str__(self)

Return str(self).

_initialize(self, base, create_tables=False, **kwargs)

Initialization method to set up database connection and inject session manager

Parameters
  • create_tables – Bool, whether to create tables in database

  • drop_tables – Bool, whether to drop existing tables in database

Returns

None

close_tunnel(self)
Return type

None

configure_ssh_tunnel(self, credentials, ssh_config)
Parameters
  • credentials (Dict[str, Any]) –

  • ssh_config (Dict[str, Any]) –

Return type

Tuple[Dict[str, Any], Dict[str, Any]]

create_tables(self, base, drop_tables=False, ignore_errors=False)

Creates database tables (and potentially drops existing ones). Assumes to be running under a sufficiently privileged user

Parameters
  • drop_tables (bool) – Whether or not to drop the existing tables first.

  • ignore_errors (bool) –

Returns

None

Return type

None

property engine(self)
Return type

Any

initialize(self, base_list, **kwargs)

Initialization method to set up database connection and inject session manager

Raises a SimpleML error if database schema is not up to date

Parameters
  • drop_tables – Bool, whether to drop existing tables in database

  • upgrade – Bool, whether to run an upgrade migration after establishing a connection

Returns

None

open_tunnel(self)
Return type

None

property ssh_tunnel(self)
Return type

simpleml.imports.SSHTunnelForwarder

class simpleml.orm.BinaryStorageDatabase(config=None, configuration_section=None, uri=None, use_ssh_tunnel=False, sshtunnel_params=None, **credentials)[source]

Bases: BaseDatabase

Hardcoded database mapped to binary storage metadata

Parameters
  • use_ssh_tunnel (bool) – boolean - default false. Whether to tunnel sqlalchemy connection through an ssh tunnel or not

  • sshtunnel_params (Optional[Dict[str, Any]]) – Dict of ssh params - specify according to sshtunnel project https://github.com/pahaz/sshtunnel/ - direct passthrough

  • config (Optional[Dict[str, Any]]) –

  • configuration_section (Optional[str]) –

  • uri (Optional[str]) –

initialize(self, base_list=None, **kwargs)

Initialization method to set up database connection and inject session manager

Raises a SimpleML error if database schema is not up to date

Parameters
  • drop_tables – Bool, whether to drop existing tables in database

  • upgrade – Bool, whether to run an upgrade migration after establishing a connection

Returns

None

class simpleml.orm.Database(configuration_section=None, uri=None, database=None, username=None, password=None, drivername=None, host=None, port=None, query=None, *args, **kwargs)[source]

Bases: AlembicDatabase

SimpleML specific configuration to interact with the database Defaults to sqlite db in filestore directory

Parameters
  • use_ssh_tunnel – boolean - default false. Whether to tunnel sqlalchemy connection through an ssh tunnel or not

  • sshtunnel_params – Dict of ssh params - specify according to sshtunnel project https://github.com/pahaz/sshtunnel/ - direct passthrough

initialize(self, base_list=None, **kwargs)

Initialization method to set up database connection and inject session manager

Raises a SimpleML error if database schema is not up to date

Parameters
  • drop_tables – Bool, whether to drop existing tables in database

  • upgrade – Bool, whether to run an upgrade migration after establishing a connection

Returns

None

class simpleml.orm.DatasetDatabase(config=None, configuration_section=None, uri=None, use_ssh_tunnel=False, sshtunnel_params=None, **credentials)[source]

Bases: BaseDatabase

Hardcoded database mapped to dataset storage metadata

Parameters
  • use_ssh_tunnel (bool) – boolean - default false. Whether to tunnel sqlalchemy connection through an ssh tunnel or not

  • sshtunnel_params (Optional[Dict[str, Any]]) – Dict of ssh params - specify according to sshtunnel project https://github.com/pahaz/sshtunnel/ - direct passthrough

  • config (Optional[Dict[str, Any]]) –

  • configuration_section (Optional[str]) –

  • uri (Optional[str]) –

initialize(self, base_list=None, **kwargs)

Initialization method to set up database connection and inject session manager

Raises a SimpleML error if database schema is not up to date

Parameters
  • drop_tables – Bool, whether to drop existing tables in database

  • upgrade – Bool, whether to run an upgrade migration after establishing a connection

Returns

None

class simpleml.orm.ORMDataset[source]

Bases: simpleml.orm.persistable.ORMPersistable

Base class for all Dataset objects.

pipeline_id: foreign key relation to the dataset pipeline used as input

__table_args__
__tablename__ = datasets
pipeline
pipeline_id
classmethod load_pipeline(cls, id)
Parameters

id (str) –

class simpleml.orm.ORMMetric[source]

Bases: simpleml.orm.persistable.ORMPersistable

Abstract Base class for all Metric objects

name: the metric name values: JSON object with key: value pairs for performance on test dataset

(ex: FPR: TPR to create ROC Curve) Singular value metrics take the form - {‘agg’: value}

model_id: foreign key to the model that was used to generate predictions dataset_id:

__table_args__
__tablename__ = metrics
dataset
dataset_id
model
model_id
values
classmethod get_latest_version(cls, name, model_id)

Versions should be autoincrementing for each object (constrained over friendly name and model). Executes a database lookup and increments..

Parameters
  • name (str) –

  • model_id (str) –

Return type

int

classmethod load_dataset(cls, id)
Parameters

id (str) –

classmethod load_model(cls, id)
Parameters

id (str) –

class simpleml.orm.ORMModel[source]

Bases: simpleml.orm.persistable.ORMPersistable

Base class for all Model objects. Defines the required parameters for versioning and all other metadata can be stored in the arbitrary metadata field

params: model parameter metadata for easy insight into hyperparameters across trainings feature_metadata: metadata insight into resulting features and importances

pipeline_id: foreign key relation to the pipeline used to transform input to the model

(training is also dependent on originating dataset but scoring only needs access to the pipeline)

__table_args__
__tablename__ = models
feature_metadata
params
pipeline
pipeline_id
classmethod load_pipeline(cls, id)
Parameters

id (str) –

class simpleml.orm.ORMPipeline[source]

Bases: simpleml.orm.persistable.ORMPersistable

Base class for all Pipeline objects.

params: pipeline parameter metadata for easy insight into hyperparameters across trainings dataset_id: foreign key relation to the dataset used as input

__table_args__
__tablename__ = pipelines
dataset
dataset_id
params
classmethod load_dataset(cls, id)
Parameters

id (str) –