simpleml.models.split_iterators

Helper classes to iterate splits

Module Contents

Classes

DataIterator

DatasetSequence

Sequence wrapper for internal datasets. Only used for raw data mapping so

PipelineTransformIterator

Wrapper utility to convert a pipeline transform operation into an iterator

PipelineTransformSequence

Nested sequence class to apply transforms on batches in real-time and forward

PythonIterator

Pure python iterator. Converts a split object into a generator with defined

Functions

split_to_ordered_tuple(split)

Helper to convert a split object into an ordered tuple of

Attributes

__author__

simpleml.models.split_iterators.__author__ = Elisha Yadgaran[source]
class simpleml.models.split_iterators.DataIterator[source]

Bases: object

__iter__(self)[source]
class simpleml.models.split_iterators.DatasetSequence(split, batch_size=32, shuffle=True, return_tuple=True, **kwargs)[source]

Bases: simpleml.imports.Sequence

Sequence wrapper for internal datasets. Only used for raw data mapping so return type is internal Split object. Transformed sequences are used to conform with external input types (keras tuples)

Parameters
__getitem__(self, index)[source]

Gets batch at position index. # Arguments

index: position of the batch in the Sequence.

# Returns

A batch

Return type

simpleml.datasets.dataset_splits.Split

__len__(self)[source]

Number of batch in the Sequence. # Returns

The number of batches in the Sequence.

generate_indices(self)[source]
Return type

None

on_epoch_end(self)[source]

Method called at the end of every epoch.

Return type

None

static validated_split(split)[source]

Confirms data is valid, otherwise returns None (makes downstream checking simpler)

Parameters

split (Any) –

Return type

Any

class simpleml.models.split_iterators.PipelineTransformIterator(pipeline, data_iterator)[source]

Bases: DataIterator

Wrapper utility to convert a pipeline transform operation into an iterator Transforms batch on iteration with provided pipeline

Parameters
__iter__(self)[source]
__next__(self)[source]

NOTE: Some downstream objects expect to consume a generator with a tuple of X, y, other… not a Split object, so an ordered tuple will be returned if the dataset iterator returns a tuple

Return type

Union[simpleml.datasets.dataset_splits.Split, Tuple]

class simpleml.models.split_iterators.PipelineTransformSequence(pipeline, dataset_sequence)[source]

Bases: simpleml.imports.Sequence

Nested sequence class to apply transforms on batches in real-time and forward through as the next batch

Parameters
__getitem__(self, *args, **kwargs)[source]

Pass-through to dataset sequence - applies transform on data and returns transformed batch

Return type

Union[simpleml.datasets.dataset_splits.Split, Tuple]

__len__(self)[source]

Pass-through. Returns number of batches in dataset sequence

on_epoch_end(self)[source]
Return type

None

class simpleml.models.split_iterators.PythonIterator(split, infinite_loop=False, batch_size=32, shuffle=True, return_tuple=False, **kwargs)[source]

Bases: DataIterator

Pure python iterator. Converts a split object into a generator with defined batch sizes

Parameters
__iter__(self)[source]
__next__(self)[source]

Turn a dataset split into a generator

Return type

Union[simpleml.datasets.dataset_splits.Split, Tuple]

generate_indices(self)[source]
simpleml.models.split_iterators.split_to_ordered_tuple(split)[source]

Helper to convert a split object into an ordered tuple of X, y, other

Parameters

split (simpleml.datasets.dataset_splits.Split) –

Return type

Tuple