simpleml.metrics

Import modules to register class names in global registry

Expose classes in one import module

Submodules

Package Contents

Classes

AccuracyMetric

TODO: Figure out multiclass generalizations

BinaryClassificationMetric

TODO: Figure out multiclass generalizations

ClassificationMetric

TODO: Figure out multiclass generalizations

F1ScoreMetric

TODO: Figure out multiclass generalizations

FprMetric

TODO: Figure out multiclass generalizations

FprTprMetric

TODO: Figure out multiclass generalizations

Metric

Base class for all Metric objects

RocAucMetric

TODO: Figure out multiclass generalizations

ThresholdAccuracyMetric

TODO: Figure out multiclass generalizations

ThresholdF1ScoreMetric

TODO: Figure out multiclass generalizations

ThresholdFdrMetric

TODO: Figure out multiclass generalizations

ThresholdFnrMetric

TODO: Figure out multiclass generalizations

ThresholdForMetric

TODO: Figure out multiclass generalizations

ThresholdFprMetric

TODO: Figure out multiclass generalizations

ThresholdInformednessMetric

TODO: Figure out multiclass generalizations

ThresholdMarkednessMetric

TODO: Figure out multiclass generalizations

ThresholdMccMetric

TODO: Figure out multiclass generalizations

ThresholdNpvMetric

TODO: Figure out multiclass generalizations

ThresholdPpvMetric

TODO: Figure out multiclass generalizations

ThresholdTnrMetric

TODO: Figure out multiclass generalizations

ThresholdTprMetric

TODO: Figure out multiclass generalizations

TprMetric

TODO: Figure out multiclass generalizations

Attributes

__author__

simpleml.metrics.__author__ = Elisha Yadgaran[source]
class simpleml.metrics.AccuracyMetric(**kwargs)[source]

Bases: AggregateBinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

static _score(predictions, labels)

Each aggregate needs to define a separate private method to actually calculate the aggregate

Separated from the public score method to enable easier testing and extension (values can be passed from non internal properties)

class simpleml.metrics.BinaryClassificationMetric(dataset_split=None, **kwargs)[source]

Bases: ClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split (Optional[str]) – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

static _create_confusion_matrix(thresholds, probabilities, labels)

Independent computation method (easier testing)

property accuracy(self)

Convenience property for the Accuracy Rate (TP+TN/TP+FP+TN+FN)

property confusion_matrix(self)

Property method to return (or generate) dataframe of confusion matrix at each threshold

create_confusion_matrix(self)

Iterate through each threshold and compute confusion matrix

static dedupe_curve(keys, values, maximize=True, round_places=3)

Method to deduplicate multiple values for the same key on a curve (ex multiple thresholds with the same fpr and different tpr for roc)

Parameters

maximize – Boolean, whether to choose the maximum value for each unique key or the minimum

property f1(self)

Convenience property for the F1 Score (2*TP/2*TP+FP+FN)

property false_discovery_rate(self)

Convenience property for the False Discovery Rate (FP/FP+TP)

property false_negative_rate(self)

Convenience property for the False Negative Rate (FN/TP+FN)

property false_omission_rate(self)

Convenience property for the False Omission Rate (FN/TN+FN)

property false_positive_rate(self)

Convenience property for the False Positive Rate (FP/FP+TN)

property informedness(self)

Convenience property for the Informedness (TPR+TNR-1)

property labels(self)
property markedness(self)

Convenience property for the Markedness (PPV+NPV-1)

property matthews_correlation_coefficient(self)

Convenience property for the Matthews Correlation Coefficient (TP*TN-FP*FN/((FP+TP)*(TP+FN)*(TN+FP)*(TN+FN))^0.5)

property negative_predictive_value(self)

Convenience property for the Negative Predictive Value (TN/TN+FN)

property positive_predictive_value(self)

Convenience property for the Positive Predictive Value (TP/FP+TP)

property predicted_negative_rate(self)

Convenience property for the Predicted Negative Rate (TN+FN/TP+FP+TN+FN)

property predicted_positive_rate(self)

Convenience property for the Predicted Positive Rate (TP+FP/TP+FP+TN+FN)

property predictions(self)
property probabilities(self)
property thresholds(self)

Convenience property for the probability thresholds

property true_negative_rate(self)

Convenience property for the True Negative Rate (TN/FP+TN)

property true_positive_rate(self)

Convenience property for the True Positive Rate (TP/TP+FN)

static validate_labels(labels)
class simpleml.metrics.ClassificationMetric(dataset_split=None, **kwargs)[source]

Bases: simpleml.metrics.base_metric.Metric

TODO: Figure out multiclass generalizations

Parameters

dataset_split (Optional[str]) – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

_get_split(self, column)
Parameters

column (str) –

Return type

Any

property labels(self)
Return type

Any

property predictions(self)
Return type

Any

property probabilities(self)
Return type

Any

static validate_predictions(predictions)
Parameters

predictions (Any) –

Return type

None

class simpleml.metrics.F1ScoreMetric(**kwargs)[source]

Bases: AggregateBinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

static _score(predictions, labels)

Each aggregate needs to define a separate private method to actually calculate the aggregate

Separated from the public score method to enable easier testing and extension (values can be passed from non internal properties)

class simpleml.metrics.FprMetric(**kwargs)[source]

Bases: AggregateBinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

static _score(predictions, labels)

Each aggregate needs to define a separate private method to actually calculate the aggregate

Separated from the public score method to enable easier testing and extension (values can be passed from non internal properties)

class simpleml.metrics.FprTprMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.Metric(dataset_id=None, model_id=None, **kwargs)[source]

Bases: simpleml.persistables.base_persistable.Persistable

Base class for all Metric objects

Parameters
  • dataset_id (Optional[Union[str, uuid.uuid4]]) –

  • model_id (Optional[Union[str, uuid.uuid4]]) –

object_type :str = METRIC
_get_dataset_split(self, **kwargs)

Default accessor for dataset data. REFERS TO RAW DATASETS not the pipelines superimposed. That means that datasets that do not define explicit splits will have no notion of downstream splits (e.g. RandomSplitPipeline)

Return type

Any

_get_latest_version(self)

Versions should be autoincrementing for each object (constrained over friendly name and model). Executes a database lookup and increments..

Return type

int

_get_pipeline_split(self, column, split, **kwargs)

For special case where dataset is the same as the model’s dataset, the dataset splits can refer to the pipeline imposed splits, not the inherent dataset’s splits. Use the pipeline split then ex: RandomSplitPipeline on NoSplitDataset evaluating “in_sample” performance

Parameters
  • column (str) –

  • split (str) –

Return type

Any

_hash(self)
Hash is the combination of the:
  1. Model

  2. Dataset (optional)

  3. Metric

  4. Config

Return type

str

_load_dataset(self)

Helper to fetch the dataset

_load_model(self)

Helper to fetch the model

add_dataset(self, dataset)

Setter method for dataset used

Parameters

dataset (simpleml.datasets.base_dataset.Dataset) –

Return type

None

add_model(self, model)

Setter method for model used

Parameters

model (simpleml.models.base_model.Model) –

Return type

None

property dataset(self)

Use a weakref to bind linked dataset so it doesnt bloat usage returns dataset if still available or tries to fetch otherwise

property model(self)

Use a weakref to bind linked model so it doesnt bloat usage returns model if still available or tries to fetch otherwise

save(self, **kwargs)

Extend parent function with a few additional save routines

Return type

None

abstract score(self, **kwargs)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.RocAucMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

static _score(probabilities, labels)
score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdAccuracyMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdF1ScoreMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdFdrMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdFnrMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdForMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdFprMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdInformednessMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdMarkednessMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdMccMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdNpvMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdPpvMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdTnrMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdTprMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.TprMetric(**kwargs)[source]

Bases: AggregateBinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

static _score(predictions, labels)

Each aggregate needs to define a separate private method to actually calculate the aggregate

Separated from the public score method to enable easier testing and extension (values can be passed from non internal properties)