simpleml.utils
¶
Utilities
Subpackages¶
Submodules¶
Package Contents¶
Classes¶
Base database class to manage dbs with schema tracking. Includes alembic |
|
Base Database class to configure db connection |
|
Hardcoded database mapped to binary storage metadata |
|
SimpleML specific configuration to interact with the database |
|
Hardcoded database mapped to dataset storage metadata |
|
Wrapper class to load various persistables |
|
-
exception
simpleml.utils.
DatasetError
(*args, **kwargs)[source]¶ Bases:
simpleml.utils.errors.SimpleMLError
Common base class for all non-exit exceptions.
Initialize self. See help(type(self)) for accurate signature.
-
exception
simpleml.utils.
MetricError
(*args, **kwargs)[source]¶ Bases:
simpleml.utils.errors.SimpleMLError
Common base class for all non-exit exceptions.
Initialize self. See help(type(self)) for accurate signature.
-
exception
simpleml.utils.
ModelError
(*args, **kwargs)[source]¶ Bases:
simpleml.utils.errors.SimpleMLError
Common base class for all non-exit exceptions.
Initialize self. See help(type(self)) for accurate signature.
-
exception
simpleml.utils.
PipelineError
(*args, **kwargs)[source]¶ Bases:
simpleml.utils.errors.SimpleMLError
Common base class for all non-exit exceptions.
Initialize self. See help(type(self)) for accurate signature.
-
exception
simpleml.utils.
ScoringError
(*args, **kwargs)[source]¶ Bases:
simpleml.utils.errors.SimpleMLError
Common base class for all non-exit exceptions.
Initialize self. See help(type(self)) for accurate signature.
-
exception
simpleml.utils.
SimpleMLError
[source]¶ Bases:
Exception
Common base class for all non-exit exceptions.
Initialize self. See help(type(self)) for accurate signature.
-
__str__
(self)¶ Return str(self).
-
-
exception
simpleml.utils.
TrainingError
(*args, **kwargs)[source]¶ Bases:
simpleml.utils.errors.SimpleMLError
Common base class for all non-exit exceptions.
Initialize self. See help(type(self)) for accurate signature.
-
class
simpleml.utils.
AlembicDatabase
(alembic_filepath, script_location='migrations', *args, **kwargs)[source]¶ Bases:
simpleml.utils.initialization.BaseDatabase
Base database class to manage dbs with schema tracking. Includes alembic config references
- Parameters
use_ssh_tunnel – boolean - default false. Whether to tunnel sqlalchemy connection through an ssh tunnel or not
sshtunnel_params – Dict of ssh params - specify according to sshtunnel project https://github.com/pahaz/sshtunnel/ - direct passthrough
-
property
alembic_config
(self)¶
-
create_tables
(self, base, drop_tables=False, ignore_errors=False)¶ Creates database tables (and potentially drops existing ones). Assumes to be running under a sufficiently privileged user
- Parameters
drop_tables – Whether or not to drop the existing tables first.
- Returns
None
-
downgrade
(self, revision)¶ Proxy Method to invoke alembic downgrade command to specified revision Indirectly runs the alembic env.py code
-
initialize
(self, base_list, upgrade=False, validate=True, **kwargs)¶ Initialization method to set up database connection and inject session manager
Raises a SimpleML error if database schema is not up to date
- Parameters
drop_tables – Bool, whether to drop existing tables in database
upgrade – Bool, whether to run an upgrade migration after establishing a connection
- Returns
None
-
upgrade
(self, revision='head')¶ Proxy Method to invoke alembic upgrade command to specified revision Indirectly runs the alembic env.py code
-
validate_schema_version
(self, base_list)¶ Check that the newly initialized database is up-to-date Raises an error otherwise (ahead of any table model mismatches later)
-
class
simpleml.utils.
BaseDatabase
(config=None, configuration_section=None, uri=None, use_ssh_tunnel=False, sshtunnel_params=None, **credentials)[source]¶ Bases:
object
Base Database class to configure db connection Does not assume schema tracking or any other validation
Starting in sqlalchemy 1.4.2, the signature of sqlalchemy.engine.url.URL has changed to an immutable object without an __init__
- Parameters
use_ssh_tunnel – boolean - default false. Whether to tunnel sqlalchemy connection through an ssh tunnel or not
sshtunnel_params – Dict of ssh params - specify according to sshtunnel project https://github.com/pahaz/sshtunnel/ - direct passthrough
-
__repr__
(self)¶ Return repr(self).
-
__str__
(self)¶ Return str(self).
-
_initialize
(self, base, create_tables=False, **kwargs)¶ Initialization method to set up database connection and inject session manager
- Parameters
create_tables – Bool, whether to create tables in database
drop_tables – Bool, whether to drop existing tables in database
- Returns
None
-
close_tunnel
(self)¶
-
configure_ssh_tunnel
(self, credentials, ssh_config)¶
-
create_tables
(self, base, drop_tables=False, ignore_errors=False)¶ Creates database tables (and potentially drops existing ones). Assumes to be running under a sufficiently privileged user
- Parameters
drop_tables – Whether or not to drop the existing tables first.
- Returns
None
-
property
engine
(self)¶
-
initialize
(self, base_list, **kwargs)¶ Initialization method to set up database connection and inject session manager
Raises a SimpleML error if database schema is not up to date
- Parameters
drop_tables – Bool, whether to drop existing tables in database
upgrade – Bool, whether to run an upgrade migration after establishing a connection
- Returns
None
-
open_tunnel
(self)¶
-
property
ssh_tunnel
(self)¶
-
class
simpleml.utils.
BinaryStorageDatabase
(config=None, configuration_section=None, uri=None, use_ssh_tunnel=False, sshtunnel_params=None, **credentials)[source]¶ Bases:
simpleml.utils.initialization.BaseDatabase
Hardcoded database mapped to binary storage metadata
- Parameters
use_ssh_tunnel – boolean - default false. Whether to tunnel sqlalchemy connection through an ssh tunnel or not
sshtunnel_params – Dict of ssh params - specify according to sshtunnel project https://github.com/pahaz/sshtunnel/ - direct passthrough
-
initialize
(self, base_list=None, **kwargs)¶ Initialization method to set up database connection and inject session manager
Raises a SimpleML error if database schema is not up to date
- Parameters
drop_tables – Bool, whether to drop existing tables in database
upgrade – Bool, whether to run an upgrade migration after establishing a connection
- Returns
None
-
class
simpleml.utils.
Database
(configuration_section=None, uri=None, database=None, username=None, password=None, drivername=None, host=None, port=None, query=None, *args, **kwargs)[source]¶ Bases:
simpleml.utils.initialization.AlembicDatabase
SimpleML specific configuration to interact with the database Defaults to sqlite db in filestore directory
- Parameters
use_ssh_tunnel – boolean - default false. Whether to tunnel sqlalchemy connection through an ssh tunnel or not
sshtunnel_params – Dict of ssh params - specify according to sshtunnel project https://github.com/pahaz/sshtunnel/ - direct passthrough
-
initialize
(self, base_list=None, **kwargs)¶ Initialization method to set up database connection and inject session manager
Raises a SimpleML error if database schema is not up to date
- Parameters
drop_tables – Bool, whether to drop existing tables in database
upgrade – Bool, whether to run an upgrade migration after establishing a connection
- Returns
None
-
class
simpleml.utils.
DatasetCreator
[source]¶ Bases:
simpleml.utils.training.create_persistable.PersistableCreator
-
classmethod
create
(cls, registered_name, **kwargs)¶ Stateless method to create a new persistable with the desired parameters kwargs are passed directly to persistable
- Parameters
registered_name – Class name registered in SimpleML
dataset_pipeline – dataset pipeline object
-
classmethod
determine_filters
(cls, strict=True, **kwargs)¶ stateless method to determine which filters to apply when looking for existing persistable
Returns: database class, filter dictionary
- Parameters
registered_name – Class name registered in SimpleML
strict – whether to assume same class and name = same persistable,
or, load the data and compare the hash
-
classmethod
-
class
simpleml.utils.
DatasetDatabase
(config=None, configuration_section=None, uri=None, use_ssh_tunnel=False, sshtunnel_params=None, **credentials)[source]¶ Bases:
simpleml.utils.initialization.BaseDatabase
Hardcoded database mapped to dataset storage metadata
- Parameters
use_ssh_tunnel – boolean - default false. Whether to tunnel sqlalchemy connection through an ssh tunnel or not
sshtunnel_params – Dict of ssh params - specify according to sshtunnel project https://github.com/pahaz/sshtunnel/ - direct passthrough
-
initialize
(self, base_list=None, **kwargs)¶ Initialization method to set up database connection and inject session manager
Raises a SimpleML error if database schema is not up to date
- Parameters
drop_tables – Bool, whether to drop existing tables in database
upgrade – Bool, whether to run an upgrade migration after establishing a connection
- Returns
None
-
class
simpleml.utils.
MetricCreator
[source]¶ Bases:
simpleml.utils.training.create_persistable.PersistableCreator
-
classmethod
create
(cls, registered_name, **kwargs)¶ Stateless method to create a new persistable with the desired parameters kwargs are passed directly to persistable
- Parameters
registered_name – Class name registered in SimpleML
model – model class
-
classmethod
determine_filters
(cls, strict=False, **kwargs)¶ stateless method to determine which filters to apply when looking for existing persistable
Returns: database class, filter dictionary
- Parameters
registered_name – Class name registered in SimpleML
strict – whether to fit objects first before assuming they are identical
In theory if all inputs and classes are the same, the outputs should deterministically be the same as well (up to random iter). So, you dont need to fit objects to be sure they are the same
-
classmethod
-
class
simpleml.utils.
ModelCreator
[source]¶ Bases:
simpleml.utils.training.create_persistable.PersistableCreator
-
classmethod
create
(cls, registered_name, **kwargs)¶ Stateless method to create a new persistable with the desired parameters kwargs are passed directly to persistable
- Parameters
registered_name – Class name registered in SimpleML
pipeline – pipeline object
-
classmethod
determine_filters
(cls, strict=False, **kwargs)¶ stateless method to determine which filters to apply when looking for existing persistable
Returns: database class, filter dictionary
- Parameters
registered_name – Class name registered in SimpleML
strict – whether to fit objects first before assuming they are identical
In theory if all inputs and classes are the same, the outputs should deterministically be the same as well (up to random iter). So, you dont need to fit objects to be sure they are the same
-
classmethod
-
class
simpleml.utils.
PersistableCreator
[source]¶ Bases:
future.utils.with_metaclass()
-
abstract
create
(cls, **kwargs)¶ method to create a new persistable with the desired parameters kwargs are passed directly to persistable
-
abstract
determine_filters
(cls, strict=False, **kwargs)¶ method to determine which filters to apply when looking for existing persistable
- Parameters
strict – whether to fit objects first before assuming they are identical
In theory if all inputs and classes are the same, the outputs should deterministically be the same as well (up to random iter). So, you dont need to fit objects to be sure they are the same
- Default design iterates through 2 (or 3) options when retrieving persistables:
By name and version (unique properties that define persistables)
2) By name, registered_name, and computed hash 2.5) Optionally, just use name and registered_name (assumes class
definition is the same and would result in an identical persistable)
Returns: database class, filter dictionary
-
static
retrieve
(cls, filters)¶ Query database using the table model (cls) and filters for a matching persistable
-
classmethod
retrieve_dataset
(cls, dataset=None, dataset_id: str = None, dataset_kwargs=None, **kwargs)¶
-
static
retrieve_dependency
(dependency_cls, **dependency_kwargs)¶ Base method to query for dependency Raises TrainingError if dependency does not exist
-
static
retrieve_from_registry
(registered_name)¶ stateless method to query registry for class definitions. handles errors
-
classmethod
retrieve_or_create
(self, **kwargs)¶ Wrapper method to first attempt to retrieve a matching persistable and then create a new one if it isn’t found
-
abstract
-
class
simpleml.utils.
PersistableLoader
[source]¶ Bases:
object
Wrapper class to load various persistables
- Sqlalchemy-mixins active record style allows for keyword based filtering:
BaseClass.where(**filters).order_by(**ordering).first()
-
classmethod
load_dataset
(cls, name='default', **filters)¶
-
classmethod
load_metric
(cls, name, model_id, **filters)¶
-
classmethod
load_model
(cls, name='default', **filters)¶
-
classmethod
load_persistable
(cls, persistable_class, filters)¶
-
classmethod
load_pipeline
(cls, name='default', **filters)¶
-
static
validate_environment
(persistable)¶
-
class
simpleml.utils.
PipelineCreator
[source]¶ Bases:
simpleml.utils.training.create_persistable.PersistableCreator
-
classmethod
create
(cls, registered_name, **kwargs)¶ Stateless method to create a new persistable with the desired parameters kwargs are passed directly to persistable
- Parameters
registered_name – Class name registered in SimpleML
dataset – dataset object
-
classmethod
determine_filters
(cls, strict=False, **kwargs)¶ stateless method to determine which filters to apply when looking for existing persistable
Returns: database class, filter dictionary
- Parameters
registered_name – Class name registered in SimpleML
strict – whether to fit objects first before assuming they are identical
In theory if all inputs and classes are the same, the outputs should deterministically be the same as well (up to random iter). So, you dont need to fit objects to be sure they are the same
-
classmethod