simpleml.persistables.hashing

Mixin classes to handle hashing

Module Contents

Classes

CustomHasherMixin

Mixin class to hash any object

Functions

_pandas_hash(df)

Helper for hashing pandas - outside local context to be pickleable between threads

Attributes

LOGGER

__author__

simpleml.persistables.hashing.LOGGER[source]
simpleml.persistables.hashing.__author__ = Elisha Yadgaran[source]
class simpleml.persistables.hashing.CustomHasherMixin[source]

Bases: object

Mixin class to hash any object

classmethod custom_hasher(cls, object_to_hash, raise_on_nonprimitive=False)[source]

Adapted from: https://stackoverflow.com/questions/5884066/hashing-a-dictionary Makes a hash from a dictionary, list, tuple or set to any level, that contains only other hashable types (including any lists, tuples, sets, and dictionaries). In the case where other kinds of objects (like classes) need to be hashed, pass in a collection of object attributes that are pertinent. For example, a class can be hashed in this fashion:

custom_hasher([cls.__dict__, cls.__name__])

A function can be hashed like so:

custom_hasher([fn.__dict__, fn.__code__])

python 3.3+ changes the default hash method to add an additional random seed. Need to set the global PYTHONHASHSEED=0 or use a different hash function

reduces to primitive dtypes and then calls cls.md5_hasher for a consistent hash value. falls back to joblib pickle hashing for other dtypes

If raise_on_nonprimitive is True, raises a ValueError if object_to_hash will be hashed based on inconsistent-across-instantiations object identity rather than primitives/consistent-across-instantiations content values.

Parameters
  • object_to_hash (Any) –

  • raise_on_nonprimitive (bool) –

Return type

str

static md5_hasher(object_to_hash)[source]

Generate a simple deterministic hash with md5 - only supports basic dtypes

Parameters

object_to_hash (Union[Tuple[float, str, int], float, str, int]) –

Return type

str

simpleml.persistables.hashing._pandas_hash(df)[source]

Helper for hashing pandas - outside local context to be pickleable between threads (recursive class reference causes dask subgraph issues)