simpleml.persistables.hashing module

Mixin classes to handle hashing

class simpleml.persistables.hashing.CustomHasherMixin[source]

Bases: object

Mixin class to hash any object

custom_hasher(object_to_hash, custom_class_proxy=<class 'mappingproxy'>)[source]

Adapted from: https://stackoverflow.com/questions/5884066/hashing-a-dictionary Makes a hash from a dictionary, list, tuple or set to any level, that contains only other hashable types (including any lists, tuples, sets, and dictionaries). In the case where other kinds of objects (like classes) need to be hashed, pass in a collection of object attributes that are pertinent. For example, a class can be hashed in this fashion:

custom_hasher([cls.__dict__, cls.__name__])

A function can be hashed like so:

custom_hasher([fn.__dict__, fn.__code__])

python 3.3+ changes the default hash method to add an additional random seed. Need to set the global PYTHONHASHSEED=0 or use a different hash function

class simpleml.persistables.hashing.Hasher(hash_name='md5')[source]

Bases: pickle._Pickler

A subclass of pickler, to do cryptographic hashing, rather than pickling.

dispatch = {<class 'NoneType'>: <function _Pickler.save_none>, <class 'bool'>: <function _Pickler.save_bool>, <class 'int'>: <function _Pickler.save_long>, <class 'float'>: <function _Pickler.save_float>, <class 'bytes'>: <function _Pickler.save_bytes>, <class 'str'>: <function _Pickler.save_str>, <class 'tuple'>: <function _Pickler.save_tuple>, <class 'list'>: <function _Pickler.save_list>, <class 'dict'>: <function save_module_dict>, <class 'set'>: <function Hasher.save_set>, <class 'frozenset'>: <function _Pickler.save_frozenset>, <class 'function'>: <function save_function>, <class 'type'>: <function Hasher.save_global>, <class 'code'>: <function save_code>, <class '_thread.lock'>: <function save_lock>, <class '_thread.RLock'>: <function save_rlock>, <class 'operator.itemgetter'>: <function save_itemgetter>, <class 'operator.attrgetter'>: <function save_attrgetter>, <class '_io.TextIOWrapper'>: <function save_file>, <class '_io.BufferedWriter'>: <function save_file>, <class '_io.BufferedReader'>: <function save_file>, <class '_io.BufferedRandom'>: <function save_file>, <class '_io.FileIO'>: <function save_file>, <class '_pyio.TextIOWrapper'>: <function save_file>, <class '_pyio.BufferedWriter'>: <function save_file>, <class '_pyio.BufferedReader'>: <function save_file>, <class '_pyio.BufferedRandom'>: <function save_file>, <class 'functools.partial'>: <function save_functor>, <class 'super'>: <function save_super>, <class 'builtin_function_or_method'>: <function Hasher.save_global>, <class 'method'>: <function save_instancemethod0>, <class 'classmethod_descriptor'>: <function save_wrapper_descriptor>, <class 'wrapper_descriptor'>: <function save_wrapper_descriptor>, <class 'method_descriptor'>: <function save_wrapper_descriptor>, <class 'getset_descriptor'>: <function save_wrapper_descriptor>, <class 'member_descriptor'>: <function save_wrapper_descriptor>, <class 'method-wrapper'>: <function save_instancemethod>, <class 'cell'>: <function save_cell>, <class 'mappingproxy'>: <function save_dictproxy>, <class 'slice'>: <function save_slice>, <class 'NotImplementedType'>: <function save_singleton>, <class 'ellipsis'>: <function save_singleton>, <class 'range'>: <function save_singleton>, <class 'weakref'>: <function save_weakref>, <class 'weakcallableproxy'>: <function save_weakproxy>, <class 'weakproxy'>: <function save_weakproxy>, <class 'module'>: <function save_module>, <class 'property'>: <function save_property>, <class 'classmethod'>: <function save_classmethod>, <class 'staticmethod'>: <function save_classmethod>}
hash(obj, return_digest=True)[source]
memoize(obj)[source]

Store an object in the memo.

save(obj)[source]
save_global(obj, name=None, pack=<built-in function pack>)[source]
save_set(set_items)[source]
class simpleml.persistables.hashing.NumpyHasher(hash_name='md5', coerce_mmap=False)[source]

Bases: simpleml.persistables.hashing.Hasher

Special case the hasher for when numpy is loaded.

save(obj)[source]

Subclass the save method, to hash ndarray subclass, rather than pickling them. Off course, this is a total abuse of the Pickler class.

simpleml.persistables.hashing.hash(obj, hash_name='md5', coerce_mmap=False)[source]

Quick calculation of a hash to identify uniquely Python objects containing numpy arrays. Parameters ———– hash_name: ‘md5’ or ‘sha1’

Hashing algorithm used. sha1 is supposedly safer, but md5 is faster.
coerce_mmap: boolean
Make no difference between np.memmap and np.ndarray