simpleml.utils

Utilities

Package Contents

Classes

AlembicDatabase

Base database class to manage dbs with schema tracking. Includes alembic

BaseDatabase

Base Database class to configure db connection

BinaryStorageDatabase

Hardcoded database mapped to binary storage metadata

Database

SimpleML specific configuration to interact with the database

DatasetCreator

DatasetDatabase

Hardcoded database mapped to dataset storage metadata

MetricCreator

ModelCreator

PersistableCreator

PersistableLoader

Wrapper class to load various persistables

PipelineCreator

simpleml.utils.CONFIG[source]
simpleml.utils.FILESTORE_DIRECTORY[source]
simpleml.utils.HDF5_FILESTORE_DIRECTORY[source]
simpleml.utils.PICKLED_FILESTORE_DIRECTORY[source]
simpleml.utils.SIMPLEML_DIRECTORY[source]
simpleml.utils.__author__ = Elisha Yadgaran[source]
exception simpleml.utils.DatasetError(*args, **kwargs)[source]

Bases: simpleml.utils.errors.SimpleMLError

Common base class for all non-exit exceptions.

Initialize self. See help(type(self)) for accurate signature.

exception simpleml.utils.MetricError(*args, **kwargs)[source]

Bases: simpleml.utils.errors.SimpleMLError

Common base class for all non-exit exceptions.

Initialize self. See help(type(self)) for accurate signature.

exception simpleml.utils.ModelError(*args, **kwargs)[source]

Bases: simpleml.utils.errors.SimpleMLError

Common base class for all non-exit exceptions.

Initialize self. See help(type(self)) for accurate signature.

exception simpleml.utils.PipelineError(*args, **kwargs)[source]

Bases: simpleml.utils.errors.SimpleMLError

Common base class for all non-exit exceptions.

Initialize self. See help(type(self)) for accurate signature.

exception simpleml.utils.ScoringError(*args, **kwargs)[source]

Bases: simpleml.utils.errors.SimpleMLError

Common base class for all non-exit exceptions.

Initialize self. See help(type(self)) for accurate signature.

exception simpleml.utils.SimpleMLError[source]

Bases: Exception

Common base class for all non-exit exceptions.

Initialize self. See help(type(self)) for accurate signature.

__str__(self)

Return str(self).

exception simpleml.utils.TrainingError(*args, **kwargs)[source]

Bases: simpleml.utils.errors.SimpleMLError

Common base class for all non-exit exceptions.

Initialize self. See help(type(self)) for accurate signature.

class simpleml.utils.AlembicDatabase(alembic_filepath, script_location='migrations', *args, **kwargs)[source]

Bases: simpleml.utils.initialization.BaseDatabase

Base database class to manage dbs with schema tracking. Includes alembic config references

Parameters
  • use_ssh_tunnel – boolean - default false. Whether to tunnel sqlalchemy connection through an ssh tunnel or not

  • sshtunnel_params – Dict of ssh params - specify according to sshtunnel project https://github.com/pahaz/sshtunnel/ - direct passthrough

property alembic_config(self)
create_tables(self, base, drop_tables=False, ignore_errors=False)

Creates database tables (and potentially drops existing ones). Assumes to be running under a sufficiently privileged user

Parameters

drop_tables – Whether or not to drop the existing tables first.

Returns

None

downgrade(self, revision)

Proxy Method to invoke alembic downgrade command to specified revision Indirectly runs the alembic env.py code

initialize(self, base_list, upgrade=False, validate=True, **kwargs)

Initialization method to set up database connection and inject session manager

Raises a SimpleML error if database schema is not up to date

Parameters
  • drop_tables – Bool, whether to drop existing tables in database

  • upgrade – Bool, whether to run an upgrade migration after establishing a connection

Returns

None

upgrade(self, revision='head')

Proxy Method to invoke alembic upgrade command to specified revision Indirectly runs the alembic env.py code

validate_schema_version(self, base_list)

Check that the newly initialized database is up-to-date Raises an error otherwise (ahead of any table model mismatches later)

class simpleml.utils.BaseDatabase(config=None, configuration_section=None, uri=None, use_ssh_tunnel=False, sshtunnel_params=None, **credentials)[source]

Bases: object

Base Database class to configure db connection Does not assume schema tracking or any other validation

Starting in sqlalchemy 1.4.2, the signature of sqlalchemy.engine.url.URL has changed to an immutable object without an __init__

Parameters
  • use_ssh_tunnel – boolean - default false. Whether to tunnel sqlalchemy connection through an ssh tunnel or not

  • sshtunnel_params – Dict of ssh params - specify according to sshtunnel project https://github.com/pahaz/sshtunnel/ - direct passthrough

__repr__(self)

Return repr(self).

__str__(self)

Return str(self).

_initialize(self, base, create_tables=False, **kwargs)

Initialization method to set up database connection and inject session manager

Parameters
  • create_tables – Bool, whether to create tables in database

  • drop_tables – Bool, whether to drop existing tables in database

Returns

None

close_tunnel(self)
configure_ssh_tunnel(self, credentials, ssh_config)
create_tables(self, base, drop_tables=False, ignore_errors=False)

Creates database tables (and potentially drops existing ones). Assumes to be running under a sufficiently privileged user

Parameters

drop_tables – Whether or not to drop the existing tables first.

Returns

None

property engine(self)
initialize(self, base_list, **kwargs)

Initialization method to set up database connection and inject session manager

Raises a SimpleML error if database schema is not up to date

Parameters
  • drop_tables – Bool, whether to drop existing tables in database

  • upgrade – Bool, whether to run an upgrade migration after establishing a connection

Returns

None

open_tunnel(self)
property ssh_tunnel(self)
class simpleml.utils.BinaryStorageDatabase(config=None, configuration_section=None, uri=None, use_ssh_tunnel=False, sshtunnel_params=None, **credentials)[source]

Bases: simpleml.utils.initialization.BaseDatabase

Hardcoded database mapped to binary storage metadata

Parameters
  • use_ssh_tunnel – boolean - default false. Whether to tunnel sqlalchemy connection through an ssh tunnel or not

  • sshtunnel_params – Dict of ssh params - specify according to sshtunnel project https://github.com/pahaz/sshtunnel/ - direct passthrough

initialize(self, base_list=None, **kwargs)

Initialization method to set up database connection and inject session manager

Raises a SimpleML error if database schema is not up to date

Parameters
  • drop_tables – Bool, whether to drop existing tables in database

  • upgrade – Bool, whether to run an upgrade migration after establishing a connection

Returns

None

class simpleml.utils.Database(configuration_section=None, uri=None, database=None, username=None, password=None, drivername=None, host=None, port=None, query=None, *args, **kwargs)[source]

Bases: simpleml.utils.initialization.AlembicDatabase

SimpleML specific configuration to interact with the database Defaults to sqlite db in filestore directory

Parameters
  • use_ssh_tunnel – boolean - default false. Whether to tunnel sqlalchemy connection through an ssh tunnel or not

  • sshtunnel_params – Dict of ssh params - specify according to sshtunnel project https://github.com/pahaz/sshtunnel/ - direct passthrough

initialize(self, base_list=None, **kwargs)

Initialization method to set up database connection and inject session manager

Raises a SimpleML error if database schema is not up to date

Parameters
  • drop_tables – Bool, whether to drop existing tables in database

  • upgrade – Bool, whether to run an upgrade migration after establishing a connection

Returns

None

class simpleml.utils.DatasetCreator[source]

Bases: simpleml.utils.training.create_persistable.PersistableCreator

classmethod create(cls, registered_name, **kwargs)

Stateless method to create a new persistable with the desired parameters kwargs are passed directly to persistable

Parameters
  • registered_name – Class name registered in SimpleML

  • dataset_pipeline – dataset pipeline object

classmethod determine_filters(cls, strict=True, **kwargs)

stateless method to determine which filters to apply when looking for existing persistable

Returns: database class, filter dictionary

Parameters
  • registered_name – Class name registered in SimpleML

  • strict – whether to assume same class and name = same persistable,

or, load the data and compare the hash

class simpleml.utils.DatasetDatabase(config=None, configuration_section=None, uri=None, use_ssh_tunnel=False, sshtunnel_params=None, **credentials)[source]

Bases: simpleml.utils.initialization.BaseDatabase

Hardcoded database mapped to dataset storage metadata

Parameters
  • use_ssh_tunnel – boolean - default false. Whether to tunnel sqlalchemy connection through an ssh tunnel or not

  • sshtunnel_params – Dict of ssh params - specify according to sshtunnel project https://github.com/pahaz/sshtunnel/ - direct passthrough

initialize(self, base_list=None, **kwargs)

Initialization method to set up database connection and inject session manager

Raises a SimpleML error if database schema is not up to date

Parameters
  • drop_tables – Bool, whether to drop existing tables in database

  • upgrade – Bool, whether to run an upgrade migration after establishing a connection

Returns

None

class simpleml.utils.MetricCreator[source]

Bases: simpleml.utils.training.create_persistable.PersistableCreator

classmethod create(cls, registered_name, **kwargs)

Stateless method to create a new persistable with the desired parameters kwargs are passed directly to persistable

Parameters
  • registered_name – Class name registered in SimpleML

  • model – model class

classmethod determine_filters(cls, strict=False, **kwargs)

stateless method to determine which filters to apply when looking for existing persistable

Returns: database class, filter dictionary

Parameters
  • registered_name – Class name registered in SimpleML

  • strict – whether to fit objects first before assuming they are identical

In theory if all inputs and classes are the same, the outputs should deterministically be the same as well (up to random iter). So, you dont need to fit objects to be sure they are the same

class simpleml.utils.ModelCreator[source]

Bases: simpleml.utils.training.create_persistable.PersistableCreator

classmethod create(cls, registered_name, **kwargs)

Stateless method to create a new persistable with the desired parameters kwargs are passed directly to persistable

Parameters
  • registered_name – Class name registered in SimpleML

  • pipeline – pipeline object

classmethod determine_filters(cls, strict=False, **kwargs)

stateless method to determine which filters to apply when looking for existing persistable

Returns: database class, filter dictionary

Parameters
  • registered_name – Class name registered in SimpleML

  • strict – whether to fit objects first before assuming they are identical

In theory if all inputs and classes are the same, the outputs should deterministically be the same as well (up to random iter). So, you dont need to fit objects to be sure they are the same

class simpleml.utils.PersistableCreator[source]

Bases: future.utils.with_metaclass()

abstract create(cls, **kwargs)

method to create a new persistable with the desired parameters kwargs are passed directly to persistable

abstract determine_filters(cls, strict=False, **kwargs)

method to determine which filters to apply when looking for existing persistable

Parameters

strict – whether to fit objects first before assuming they are identical

In theory if all inputs and classes are the same, the outputs should deterministically be the same as well (up to random iter). So, you dont need to fit objects to be sure they are the same

Default design iterates through 2 (or 3) options when retrieving persistables:
  1. By name and version (unique properties that define persistables)

2) By name, registered_name, and computed hash 2.5) Optionally, just use name and registered_name (assumes class

definition is the same and would result in an identical persistable)

Returns: database class, filter dictionary

static retrieve(cls, filters)

Query database using the table model (cls) and filters for a matching persistable

classmethod retrieve_dataset(cls, dataset=None, dataset_id: str = None, dataset_kwargs=None, **kwargs)
static retrieve_dependency(dependency_cls, **dependency_kwargs)

Base method to query for dependency Raises TrainingError if dependency does not exist

static retrieve_from_registry(registered_name)

stateless method to query registry for class definitions. handles errors

classmethod retrieve_model(cls, model=None, model_id: str = None, model_kwargs=None, **kwargs)
classmethod retrieve_or_create(self, **kwargs)

Wrapper method to first attempt to retrieve a matching persistable and then create a new one if it isn’t found

classmethod retrieve_pipeline(cls, pipeline=None, pipeline_id: str = None, pipeline_kwargs=None, **kwargs)
class simpleml.utils.PersistableLoader[source]

Bases: object

Wrapper class to load various persistables

Sqlalchemy-mixins active record style allows for keyword based filtering:

BaseClass.where(**filters).order_by(**ordering).first()

classmethod load_dataset(cls, name='default', **filters)
classmethod load_metric(cls, name, model_id, **filters)
classmethod load_model(cls, name='default', **filters)
classmethod load_persistable(cls, persistable_class, filters)
classmethod load_pipeline(cls, name='default', **filters)
static validate_environment(persistable)
class simpleml.utils.PipelineCreator[source]

Bases: simpleml.utils.training.create_persistable.PersistableCreator

classmethod create(cls, registered_name, **kwargs)

Stateless method to create a new persistable with the desired parameters kwargs are passed directly to persistable

Parameters
  • registered_name – Class name registered in SimpleML

  • dataset – dataset object

classmethod determine_filters(cls, strict=False, **kwargs)

stateless method to determine which filters to apply when looking for existing persistable

Returns: database class, filter dictionary

Parameters
  • registered_name – Class name registered in SimpleML

  • strict – whether to fit objects first before assuming they are identical

In theory if all inputs and classes are the same, the outputs should deterministically be the same as well (up to random iter). So, you dont need to fit objects to be sure they are the same