simpleml.utils

Utilities

Subpackages

Submodules

Package Contents

Classes

AlembicDatabase

Base database class to manage dbs with schema tracking. Includes alembic

BaseDatabase

Base Database class to configure db connection

BinaryStorageDatabase

Hardcoded database mapped to binary storage metadata

Database

SimpleML specific configuration to interact with the database

DatasetCreator

DatasetDatabase

Hardcoded database mapped to dataset storage metadata

MetricCreator

ModelCreator

PersistableCreator

PersistableLoader

Wrapper class to load various persistables

PipelineCreator

Attributes

CONFIG

FILESTORE_DIRECTORY

SIMPLEML_DIRECTORY

__author__

simpleml.utils.CONFIG[source]
simpleml.utils.FILESTORE_DIRECTORY[source]
simpleml.utils.SIMPLEML_DIRECTORY[source]
simpleml.utils.__author__ = Elisha Yadgaran[source]
exception simpleml.utils.DatasetError(*args, **kwargs)[source]

Bases: SimpleMLError

Common base class for all non-exit exceptions.

Initialize self. See help(type(self)) for accurate signature.

exception simpleml.utils.MetricError(*args, **kwargs)[source]

Bases: SimpleMLError

Common base class for all non-exit exceptions.

Initialize self. See help(type(self)) for accurate signature.

exception simpleml.utils.ModelError(*args, **kwargs)[source]

Bases: SimpleMLError

Common base class for all non-exit exceptions.

Initialize self. See help(type(self)) for accurate signature.

exception simpleml.utils.PipelineError(*args, **kwargs)[source]

Bases: SimpleMLError

Common base class for all non-exit exceptions.

Initialize self. See help(type(self)) for accurate signature.

exception simpleml.utils.ScoringError(*args, **kwargs)[source]

Bases: SimpleMLError

Common base class for all non-exit exceptions.

Initialize self. See help(type(self)) for accurate signature.

exception simpleml.utils.SimpleMLError[source]

Bases: Exception

Common base class for all non-exit exceptions.

Initialize self. See help(type(self)) for accurate signature.

__str__(self)

Return str(self).

exception simpleml.utils.TrainingError(*args, **kwargs)[source]

Bases: SimpleMLError

Common base class for all non-exit exceptions.

Initialize self. See help(type(self)) for accurate signature.

class simpleml.utils.AlembicDatabase(alembic_filepath, script_location='migrations', *args, **kwargs)[source]

Bases: BaseDatabase

Base database class to manage dbs with schema tracking. Includes alembic config references

Parameters
  • use_ssh_tunnel – boolean - default false. Whether to tunnel sqlalchemy connection through an ssh tunnel or not

  • sshtunnel_params – Dict of ssh params - specify according to sshtunnel project https://github.com/pahaz/sshtunnel/ - direct passthrough

property alembic_config(self)
create_tables(self, base, drop_tables=False, ignore_errors=False)

Creates database tables (and potentially drops existing ones). Assumes to be running under a sufficiently privileged user

Parameters

drop_tables – Whether or not to drop the existing tables first.

Returns

None

downgrade(self, revision)

Proxy Method to invoke alembic downgrade command to specified revision Indirectly runs the alembic env.py code

initialize(self, base_list, upgrade=False, validate=True, **kwargs)

Initialization method to set up database connection and inject session manager

Raises a SimpleML error if database schema is not up to date

Parameters
  • drop_tables – Bool, whether to drop existing tables in database

  • upgrade – Bool, whether to run an upgrade migration after establishing a connection

Returns

None

upgrade(self, revision='head')

Proxy Method to invoke alembic upgrade command to specified revision Indirectly runs the alembic env.py code

validate_schema_version(self, base_list)

Check that the newly initialized database is up-to-date Raises an error otherwise (ahead of any table model mismatches later)

class simpleml.utils.BaseDatabase(config=None, configuration_section=None, uri=None, use_ssh_tunnel=False, sshtunnel_params=None, **credentials)[source]

Bases: object

Base Database class to configure db connection Does not assume schema tracking or any other validation

Starting in sqlalchemy 1.4.2, the signature of sqlalchemy.engine.url.URL has changed to an immutable object without an __init__

Parameters
  • use_ssh_tunnel (bool) – boolean - default false. Whether to tunnel sqlalchemy connection through an ssh tunnel or not

  • sshtunnel_params (Optional[Dict[str, Any]]) – Dict of ssh params - specify according to sshtunnel project https://github.com/pahaz/sshtunnel/ - direct passthrough

  • config (Optional[Dict[str, Any]]) –

  • configuration_section (Optional[str]) –

  • uri (Optional[str]) –

__repr__(self)

Return repr(self).

__str__(self)

Return str(self).

_initialize(self, base, create_tables=False, **kwargs)

Initialization method to set up database connection and inject session manager

Parameters
  • create_tables – Bool, whether to create tables in database

  • drop_tables – Bool, whether to drop existing tables in database

Returns

None

close_tunnel(self)
Return type

None

configure_ssh_tunnel(self, credentials, ssh_config)
Parameters
  • credentials (Dict[str, Any]) –

  • ssh_config (Dict[str, Any]) –

Return type

Tuple[Dict[str, Any], Dict[str, Any]]

create_tables(self, base, drop_tables=False, ignore_errors=False)

Creates database tables (and potentially drops existing ones). Assumes to be running under a sufficiently privileged user

Parameters
  • drop_tables (bool) – Whether or not to drop the existing tables first.

  • ignore_errors (bool) –

Returns

None

Return type

None

property engine(self)
Return type

Any

initialize(self, base_list, **kwargs)

Initialization method to set up database connection and inject session manager

Raises a SimpleML error if database schema is not up to date

Parameters
  • drop_tables – Bool, whether to drop existing tables in database

  • upgrade – Bool, whether to run an upgrade migration after establishing a connection

Returns

None

open_tunnel(self)
Return type

None

property ssh_tunnel(self)
Return type

simpleml.imports.SSHTunnelForwarder

class simpleml.utils.BinaryStorageDatabase(config=None, configuration_section=None, uri=None, use_ssh_tunnel=False, sshtunnel_params=None, **credentials)[source]

Bases: BaseDatabase

Hardcoded database mapped to binary storage metadata

Parameters
  • use_ssh_tunnel (bool) – boolean - default false. Whether to tunnel sqlalchemy connection through an ssh tunnel or not

  • sshtunnel_params (Optional[Dict[str, Any]]) – Dict of ssh params - specify according to sshtunnel project https://github.com/pahaz/sshtunnel/ - direct passthrough

  • config (Optional[Dict[str, Any]]) –

  • configuration_section (Optional[str]) –

  • uri (Optional[str]) –

initialize(self, base_list=None, **kwargs)

Initialization method to set up database connection and inject session manager

Raises a SimpleML error if database schema is not up to date

Parameters
  • drop_tables – Bool, whether to drop existing tables in database

  • upgrade – Bool, whether to run an upgrade migration after establishing a connection

Returns

None

class simpleml.utils.Database(configuration_section=None, uri=None, database=None, username=None, password=None, drivername=None, host=None, port=None, query=None, *args, **kwargs)[source]

Bases: AlembicDatabase

SimpleML specific configuration to interact with the database Defaults to sqlite db in filestore directory

Parameters
  • use_ssh_tunnel – boolean - default false. Whether to tunnel sqlalchemy connection through an ssh tunnel or not

  • sshtunnel_params – Dict of ssh params - specify according to sshtunnel project https://github.com/pahaz/sshtunnel/ - direct passthrough

initialize(self, base_list=None, **kwargs)

Initialization method to set up database connection and inject session manager

Raises a SimpleML error if database schema is not up to date

Parameters
  • drop_tables – Bool, whether to drop existing tables in database

  • upgrade – Bool, whether to run an upgrade migration after establishing a connection

Returns

None

class simpleml.utils.DatasetCreator[source]

Bases: PersistableCreator

classmethod create(cls, registered_name, **kwargs)

Stateless method to create a new persistable with the desired parameters kwargs are passed directly to persistable

Parameters
  • registered_name (str) – Class name registered in SimpleML

  • dataset_pipeline – dataset pipeline object

Return type

simpleml.datasets.base_dataset.Dataset

classmethod determine_filters(cls, strict=True, **kwargs)

stateless method to determine which filters to apply when looking for existing persistable

Returns: database class, filter dictionary

Parameters
  • registered_name – Class name registered in SimpleML

  • strict (bool) – whether to assume same class and name = same persistable,

Return type

Tuple[simpleml.datasets.base_dataset.Dataset, Dict[str, Any]]

or, load the data and compare the hash

class simpleml.utils.DatasetDatabase(config=None, configuration_section=None, uri=None, use_ssh_tunnel=False, sshtunnel_params=None, **credentials)[source]

Bases: BaseDatabase

Hardcoded database mapped to dataset storage metadata

Parameters
  • use_ssh_tunnel (bool) – boolean - default false. Whether to tunnel sqlalchemy connection through an ssh tunnel or not

  • sshtunnel_params (Optional[Dict[str, Any]]) – Dict of ssh params - specify according to sshtunnel project https://github.com/pahaz/sshtunnel/ - direct passthrough

  • config (Optional[Dict[str, Any]]) –

  • configuration_section (Optional[str]) –

  • uri (Optional[str]) –

initialize(self, base_list=None, **kwargs)

Initialization method to set up database connection and inject session manager

Raises a SimpleML error if database schema is not up to date

Parameters
  • drop_tables – Bool, whether to drop existing tables in database

  • upgrade – Bool, whether to run an upgrade migration after establishing a connection

Returns

None

class simpleml.utils.MetricCreator[source]

Bases: PersistableCreator

classmethod create(cls, registered_name, **kwargs)

Stateless method to create a new persistable with the desired parameters kwargs are passed directly to persistable

Parameters
  • registered_name (str) – Class name registered in SimpleML

  • model – model class

Return type

simpleml.metrics.base_metric.Metric

classmethod determine_filters(cls, strict=False, **kwargs)

stateless method to determine which filters to apply when looking for existing persistable

Returns: database class, filter dictionary

Parameters
  • registered_name – Class name registered in SimpleML

  • strict (bool) – whether to fit objects first before assuming they are identical

Return type

Tuple[simpleml.metrics.base_metric.Metric, Dict[str, Any]]

In theory if all inputs and classes are the same, the outputs should deterministically be the same as well (up to random iter). So, you dont need to fit objects to be sure they are the same

class simpleml.utils.ModelCreator[source]

Bases: PersistableCreator

classmethod create(cls, registered_name, **kwargs)

Stateless method to create a new persistable with the desired parameters kwargs are passed directly to persistable

Parameters
  • registered_name (str) – Class name registered in SimpleML

  • pipeline – pipeline object

Return type

simpleml.models.base_model.Model

classmethod determine_filters(cls, strict=False, **kwargs)

stateless method to determine which filters to apply when looking for existing persistable

Returns: database class, filter dictionary

Parameters
  • registered_name – Class name registered in SimpleML

  • strict (bool) – whether to fit objects first before assuming they are identical

Return type

Tuple[simpleml.models.base_model.Model, Dict[str, Any]]

In theory if all inputs and classes are the same, the outputs should deterministically be the same as well (up to random iter). So, you dont need to fit objects to be sure they are the same

class simpleml.utils.PersistableCreator[source]

Bases: with_metaclass(ABCMeta, object)

abstract create(cls, **kwargs)

method to create a new persistable with the desired parameters kwargs are passed directly to persistable

abstract determine_filters(cls, strict=False, **kwargs)

method to determine which filters to apply when looking for existing persistable

Parameters

strict (bool) – whether to fit objects first before assuming they are identical

In theory if all inputs and classes are the same, the outputs should deterministically be the same as well (up to random iter). So, you dont need to fit objects to be sure they are the same

Default design iterates through 2 (or 3) options when retrieving persistables:
  1. By name and version (unique properties that define persistables)

2) By name, registered_name, and computed hash 2.5) Optionally, just use name and registered_name (assumes class

definition is the same and would result in an identical persistable)

Returns: database class, filter dictionary

static retrieve(cls, filters)

Query database using the table model (cls) and filters for a matching persistable

Parameters

filters (Dict[str, Any]) –

Return type

simpleml.persistables.base_persistable.Persistable

classmethod retrieve_dataset(cls, dataset=None, dataset_id=None, dataset_kwargs=None, **kwargs)
Parameters
Return type

simpleml.datasets.base_dataset.Dataset

static retrieve_dependency(dependency_cls, **dependency_kwargs)

Base method to query for dependency Raises TrainingError if dependency does not exist

Parameters

dependency_cls (PersistableCreator) –

Return type

simpleml.persistables.base_persistable.Persistable

static retrieve_from_registry(registered_name)

stateless method to query registry for class definitions. handles errors

Parameters

registered_name (str) –

Return type

simpleml.persistables.base_persistable.Persistable

classmethod retrieve_model(cls, model=None, model_id=None, model_kwargs=None, **kwargs)
Parameters
Return type

simpleml.models.base_model.Model

classmethod retrieve_or_create(self, **kwargs)

Wrapper method to first attempt to retrieve a matching persistable and then create a new one if it isn’t found

Return type

simpleml.persistables.base_persistable.Persistable

classmethod retrieve_pipeline(cls, pipeline=None, pipeline_id=None, pipeline_kwargs=None, **kwargs)
Parameters
Return type

simpleml.pipelines.base_pipeline.Pipeline

class simpleml.utils.PersistableLoader[source]

Bases: object

Wrapper class to load various persistables

Sqlalchemy-mixins active record style allows for keyword based filtering:

BaseClass.where(**filters).order_by(**ordering).first()

classmethod load_dataset(cls, **filters)
Return type

simpleml.datasets.base_dataset.Dataset

classmethod load_metric(cls, **filters)
Return type

simpleml.metrics.base_metric.Metric

classmethod load_model(cls, **filters)
Return type

simpleml.models.base_model.Model

classmethod load_persistable(cls, persistable_class, filters)
Parameters
Return type

simpleml.persistables.base_persistable.Persistable

classmethod load_pipeline(cls, **filters)
Return type

simpleml.pipelines.base_pipeline.Pipeline

static validate_environment(persistable)
Parameters

persistable (simpleml.persistables.base_persistable.Persistable) –

Return type

None

class simpleml.utils.PipelineCreator[source]

Bases: PersistableCreator

classmethod create(cls, registered_name, **kwargs)

Stateless method to create a new persistable with the desired parameters kwargs are passed directly to persistable

Parameters
  • registered_name (str) – Class name registered in SimpleML

  • dataset – dataset object

Return type

simpleml.pipelines.base_pipeline.Pipeline

classmethod determine_filters(cls, strict=False, **kwargs)

stateless method to determine which filters to apply when looking for existing persistable

Returns: database class, filter dictionary

Parameters
  • registered_name – Class name registered in SimpleML

  • strict (bool) – whether to fit objects first before assuming they are identical

Return type

Tuple[simpleml.pipelines.base_pipeline.Pipeline, Dict[str, Any]]

In theory if all inputs and classes are the same, the outputs should deterministically be the same as well (up to random iter). So, you dont need to fit objects to be sure they are the same