simpleml.orm
ORM package
Focuses on all database related interaction, intentionally separated to allow parallel Persistable objects to only deal with the glue interactions across python types and libraries
Each mapped Persistable table model has a 1:1 parallel class defined as a native python object
Submodules
Package Contents
Classes
Base database class to manage dbs with schema tracking. Includes alembic |
|
Base Database class to configure db connection |
|
Hardcoded database mapped to binary storage metadata |
|
SimpleML specific configuration to interact with the database |
|
Hardcoded database mapped to dataset storage metadata |
|
Base class for all Dataset objects. |
|
Abstract Base class for all Metric objects |
|
Base class for all Model objects. Defines the required |
|
Base class for all Pipeline objects. |
Attributes
- class simpleml.orm.AlembicDatabase(alembic_filepath, script_location='migrations', *args, **kwargs)[source]
Bases:
BaseDatabase
Base database class to manage dbs with schema tracking. Includes alembic config references
- Parameters
use_ssh_tunnel – boolean - default false. Whether to tunnel sqlalchemy connection through an ssh tunnel or not
sshtunnel_params – Dict of ssh params - specify according to sshtunnel project https://github.com/pahaz/sshtunnel/ - direct passthrough
- property alembic_config(self)
- create_tables(self, base, drop_tables=False, ignore_errors=False)
Creates database tables (and potentially drops existing ones). Assumes to be running under a sufficiently privileged user
- Parameters
drop_tables – Whether or not to drop the existing tables first.
- Returns
None
- downgrade(self, revision)
Proxy Method to invoke alembic downgrade command to specified revision Indirectly runs the alembic env.py code
- initialize(self, base_list, upgrade=False, validate=True, **kwargs)
Initialization method to set up database connection and inject session manager
Raises a SimpleML error if database schema is not up to date
- Parameters
drop_tables – Bool, whether to drop existing tables in database
upgrade – Bool, whether to run an upgrade migration after establishing a connection
- Returns
None
- upgrade(self, revision='head')
Proxy Method to invoke alembic upgrade command to specified revision Indirectly runs the alembic env.py code
- validate_schema_version(self, base_list)
Check that the newly initialized database is up-to-date Raises an error otherwise (ahead of any table model mismatches later)
- class simpleml.orm.BaseDatabase(config=None, configuration_section=None, uri=None, use_ssh_tunnel=False, sshtunnel_params=None, **credentials)[source]
Bases:
object
Base Database class to configure db connection Does not assume schema tracking or any other validation
Starting in sqlalchemy 1.4.2, the signature of sqlalchemy.engine.url.URL has changed to an immutable object without an __init__
- Parameters
use_ssh_tunnel (bool) – boolean - default false. Whether to tunnel sqlalchemy connection through an ssh tunnel or not
sshtunnel_params (Optional[Dict[str, Any]]) – Dict of ssh params - specify according to sshtunnel project https://github.com/pahaz/sshtunnel/ - direct passthrough
config (Optional[Dict[str, Any]]) –
configuration_section (Optional[str]) –
uri (Optional[str]) –
- __repr__(self)
Return repr(self).
- __str__(self)
Return str(self).
- _initialize(self, base, create_tables=False, **kwargs)
Initialization method to set up database connection and inject session manager
- Parameters
create_tables – Bool, whether to create tables in database
drop_tables – Bool, whether to drop existing tables in database
- Returns
None
- close_tunnel(self)
- Return type
None
- configure_ssh_tunnel(self, credentials, ssh_config)
- create_tables(self, base, drop_tables=False, ignore_errors=False)
Creates database tables (and potentially drops existing ones). Assumes to be running under a sufficiently privileged user
- property engine(self)
- Return type
Any
- initialize(self, base_list, **kwargs)
Initialization method to set up database connection and inject session manager
Raises a SimpleML error if database schema is not up to date
- Parameters
drop_tables – Bool, whether to drop existing tables in database
upgrade – Bool, whether to run an upgrade migration after establishing a connection
- Returns
None
- open_tunnel(self)
- Return type
None
- property ssh_tunnel(self)
- Return type
simpleml.imports.SSHTunnelForwarder
- class simpleml.orm.BinaryStorageDatabase(config=None, configuration_section=None, uri=None, use_ssh_tunnel=False, sshtunnel_params=None, **credentials)[source]
Bases:
BaseDatabase
Hardcoded database mapped to binary storage metadata
- Parameters
use_ssh_tunnel (bool) – boolean - default false. Whether to tunnel sqlalchemy connection through an ssh tunnel or not
sshtunnel_params (Optional[Dict[str, Any]]) – Dict of ssh params - specify according to sshtunnel project https://github.com/pahaz/sshtunnel/ - direct passthrough
config (Optional[Dict[str, Any]]) –
configuration_section (Optional[str]) –
uri (Optional[str]) –
- initialize(self, base_list=None, **kwargs)
Initialization method to set up database connection and inject session manager
Raises a SimpleML error if database schema is not up to date
- Parameters
drop_tables – Bool, whether to drop existing tables in database
upgrade – Bool, whether to run an upgrade migration after establishing a connection
- Returns
None
- class simpleml.orm.Database(configuration_section=None, uri=None, database=None, username=None, password=None, drivername=None, host=None, port=None, query=None, *args, **kwargs)[source]
Bases:
AlembicDatabase
SimpleML specific configuration to interact with the database Defaults to sqlite db in filestore directory
- Parameters
use_ssh_tunnel – boolean - default false. Whether to tunnel sqlalchemy connection through an ssh tunnel or not
sshtunnel_params – Dict of ssh params - specify according to sshtunnel project https://github.com/pahaz/sshtunnel/ - direct passthrough
- initialize(self, base_list=None, **kwargs)
Initialization method to set up database connection and inject session manager
Raises a SimpleML error if database schema is not up to date
- Parameters
drop_tables – Bool, whether to drop existing tables in database
upgrade – Bool, whether to run an upgrade migration after establishing a connection
- Returns
None
- class simpleml.orm.DatasetDatabase(config=None, configuration_section=None, uri=None, use_ssh_tunnel=False, sshtunnel_params=None, **credentials)[source]
Bases:
BaseDatabase
Hardcoded database mapped to dataset storage metadata
- Parameters
use_ssh_tunnel (bool) – boolean - default false. Whether to tunnel sqlalchemy connection through an ssh tunnel or not
sshtunnel_params (Optional[Dict[str, Any]]) – Dict of ssh params - specify according to sshtunnel project https://github.com/pahaz/sshtunnel/ - direct passthrough
config (Optional[Dict[str, Any]]) –
configuration_section (Optional[str]) –
uri (Optional[str]) –
- initialize(self, base_list=None, **kwargs)
Initialization method to set up database connection and inject session manager
Raises a SimpleML error if database schema is not up to date
- Parameters
drop_tables – Bool, whether to drop existing tables in database
upgrade – Bool, whether to run an upgrade migration after establishing a connection
- Returns
None
- class simpleml.orm.ORMDataset[source]
Bases:
simpleml.orm.persistable.ORMPersistable
Base class for all Dataset objects.
pipeline_id: foreign key relation to the dataset pipeline used as input
- __table_args__
- __tablename__ = datasets
- pipeline
- pipeline_id
- class simpleml.orm.ORMMetric[source]
Bases:
simpleml.orm.persistable.ORMPersistable
Abstract Base class for all Metric objects
name: the metric name values: JSON object with key: value pairs for performance on test dataset
(ex: FPR: TPR to create ROC Curve) Singular value metrics take the form - {‘agg’: value}
model_id: foreign key to the model that was used to generate predictions dataset_id:
- __table_args__
- __tablename__ = metrics
- dataset
- dataset_id
- model
- model_id
- values
- classmethod get_latest_version(cls, name, model_id)
Versions should be autoincrementing for each object (constrained over friendly name and model). Executes a database lookup and increments..
- class simpleml.orm.ORMModel[source]
Bases:
simpleml.orm.persistable.ORMPersistable
Base class for all Model objects. Defines the required parameters for versioning and all other metadata can be stored in the arbitrary metadata field
params: model parameter metadata for easy insight into hyperparameters across trainings feature_metadata: metadata insight into resulting features and importances
- pipeline_id: foreign key relation to the pipeline used to transform input to the model
(training is also dependent on originating dataset but scoring only needs access to the pipeline)
- __table_args__
- __tablename__ = models
- feature_metadata
- params
- pipeline
- pipeline_id
- class simpleml.orm.ORMPipeline[source]
Bases:
simpleml.orm.persistable.ORMPersistable
Base class for all Pipeline objects.
params: pipeline parameter metadata for easy insight into hyperparameters across trainings dataset_id: foreign key relation to the dataset used as input
- __table_args__
- __tablename__ = pipelines
- dataset
- dataset_id
- params