simpleml.persistables.hashing

Mixin classes to handle hashing

Module Contents

Classes

CustomHasherMixin

Mixin class to hash any object

Hasher

A subclass of pickler, to do cryptographic hashing, rather than

NumpyHasher

Special case the hasher for when numpy is loaded.

_ConsistentSet

Class used to ensure the hash of Sets is preserved

_MyHash

Class used to hash objects that won’t normally pickle

Functions

hash(obj, hash_name='md5', coerce_mmap=False)

Quick calculation of a hash to identify uniquely Python objects

simpleml.persistables.hashing.PY3_OR_LATER[source]
simpleml.persistables.hashing.Pickler[source]
simpleml.persistables.hashing.__author__ = Elisha Yadgaran[source]
simpleml.persistables.hashing._basestring[source]
class simpleml.persistables.hashing.CustomHasherMixin[source]

Bases: object

Mixin class to hash any object

classmethod custom_hasher(cls, object_to_hash, custom_class_proxy=type(object.__dict__))[source]

Adapted from: https://stackoverflow.com/questions/5884066/hashing-a-dictionary Makes a hash from a dictionary, list, tuple or set to any level, that contains only other hashable types (including any lists, tuples, sets, and dictionaries). In the case where other kinds of objects (like classes) need to be hashed, pass in a collection of object attributes that are pertinent. For example, a class can be hashed in this fashion:

custom_hasher([cls.__dict__, cls.__name__])

A function can be hashed like so:

custom_hasher([fn.__dict__, fn.__code__])

python 3.3+ changes the default hash method to add an additional random seed. Need to set the global PYTHONHASHSEED=0 or use a different hash function

class simpleml.persistables.hashing.Hasher(hash_name='md5')[source]

Bases: Pickler

A subclass of pickler, to do cryptographic hashing, rather than pickling.

This takes a binary file for writing a pickle data stream.

The optional protocol argument tells the pickler to use the given protocol; supported protocols are 0, 1, 2, 3 and 4. The default protocol is 3; a backward-incompatible protocol designed for Python 3.

Specifying a negative protocol version selects the highest protocol version supported. The higher the protocol used, the more recent the version of Python needed to read the pickle produced.

The file argument must have a write() method that accepts a single bytes argument. It can thus be a file object opened for binary writing, an io.BytesIO instance, or any other custom object that meets this interface.

If fix_imports is True and protocol is less than 3, pickle will try to map the new Python 3 names to the old module names used in Python 2, so that the pickle data stream is readable with Python 2.

dispatch[source]
_batch_setitems(self, items)[source]
hash(self, obj, return_digest=True)[source]
memoize(self, obj)[source]

Store an object in the memo.

save(self, obj)[source]
save_global(self, obj, name=None, pack=struct.pack)[source]
save_set(self, set_items)[source]
class simpleml.persistables.hashing.NumpyHasher(hash_name='md5', coerce_mmap=False)[source]

Bases: simpleml.persistables.hashing.Hasher

Special case the hasher for when numpy is loaded.

hash_name: string

The hash algorithm to be used

coerce_mmap: boolean

Make no difference between np.memmap and np.ndarray objects.

save(self, obj)[source]

Subclass the save method, to hash ndarray subclass, rather than pickling them. Off course, this is a total abuse of the Pickler class.

class simpleml.persistables.hashing._ConsistentSet(set_sequence)[source]

Bases: object

Class used to ensure the hash of Sets is preserved whatever the order of its items.

class simpleml.persistables.hashing._MyHash(*args)[source]

Bases: object

Class used to hash objects that won’t normally pickle

simpleml.persistables.hashing.hash(obj, hash_name='md5', coerce_mmap=False)[source]

Quick calculation of a hash to identify uniquely Python objects containing numpy arrays. Parameters ———– hash_name: ‘md5’ or ‘sha1’

Hashing algorithm used. sha1 is supposedly safer, but md5 is faster.

coerce_mmap: boolean

Make no difference between np.memmap and np.ndarray