simpleml.metrics

Import modules to register class names in global registry

Expose classes in one import module

Submodules

Package Contents

Classes

AccuracyMetric

TODO: Figure out multiclass generalizations

BinaryClassificationMetric

TODO: Figure out multiclass generalizations

ClassificationMetric

TODO: Figure out multiclass generalizations

F1ScoreMetric

TODO: Figure out multiclass generalizations

FprMetric

TODO: Figure out multiclass generalizations

FprTprMetric

TODO: Figure out multiclass generalizations

Metric

Base class for all Metric objects

RocAucMetric

TODO: Figure out multiclass generalizations

ThresholdAccuracyMetric

TODO: Figure out multiclass generalizations

ThresholdF1ScoreMetric

TODO: Figure out multiclass generalizations

ThresholdFdrMetric

TODO: Figure out multiclass generalizations

ThresholdFnrMetric

TODO: Figure out multiclass generalizations

ThresholdForMetric

TODO: Figure out multiclass generalizations

ThresholdFprMetric

TODO: Figure out multiclass generalizations

ThresholdInformednessMetric

TODO: Figure out multiclass generalizations

ThresholdMarkednessMetric

TODO: Figure out multiclass generalizations

ThresholdMccMetric

TODO: Figure out multiclass generalizations

ThresholdNpvMetric

TODO: Figure out multiclass generalizations

ThresholdPpvMetric

TODO: Figure out multiclass generalizations

ThresholdTnrMetric

TODO: Figure out multiclass generalizations

ThresholdTprMetric

TODO: Figure out multiclass generalizations

TprMetric

TODO: Figure out multiclass generalizations

Attributes

__author__

simpleml.metrics.__author__ = Elisha Yadgaran[source]
class simpleml.metrics.AccuracyMetric(**kwargs)[source]

Bases: AggregateBinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

static _score(predictions, labels)

Each aggregate needs to define a separate private method to actually calculate the aggregate

Separated from the public score method to enable easier testing and extension (values can be passed from non internal properties)

class simpleml.metrics.BinaryClassificationMetric(dataset_split=None, **kwargs)[source]

Bases: ClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split (Optional[str]) – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

static _create_confusion_matrix(thresholds, probabilities, labels)

Independent computation method (easier testing)

property accuracy(self)

Convenience property for the Accuracy Rate (TP+TN/TP+FP+TN+FN)

property confusion_matrix(self)

Property method to return (or generate) dataframe of confusion matrix at each threshold

create_confusion_matrix(self)

Iterate through each threshold and compute confusion matrix

static dedupe_curve(keys, values, maximize=True, round_places=3)

Method to deduplicate multiple values for the same key on a curve (ex multiple thresholds with the same fpr and different tpr for roc)

Parameters

maximize – Boolean, whether to choose the maximum value for each unique key or the minimum

property f1(self)

Convenience property for the F1 Score (2*TP/2*TP+FP+FN)

property false_discovery_rate(self)

Convenience property for the False Discovery Rate (FP/FP+TP)

property false_negative_rate(self)

Convenience property for the False Negative Rate (FN/TP+FN)

property false_omission_rate(self)

Convenience property for the False Omission Rate (FN/TN+FN)

property false_positive_rate(self)

Convenience property for the False Positive Rate (FP/FP+TN)

property informedness(self)

Convenience property for the Informedness (TPR+TNR-1)

property labels(self)
property markedness(self)

Convenience property for the Markedness (PPV+NPV-1)

property matthews_correlation_coefficient(self)

Convenience property for the Matthews Correlation Coefficient (TP*TN-FP*FN/((FP+TP)*(TP+FN)*(TN+FP)*(TN+FN))^0.5)

property negative_predictive_value(self)

Convenience property for the Negative Predictive Value (TN/TN+FN)

property positive_predictive_value(self)

Convenience property for the Positive Predictive Value (TP/FP+TP)

property predicted_negative_rate(self)

Convenience property for the Predicted Negative Rate (TN+FN/TP+FP+TN+FN)

property predicted_positive_rate(self)

Convenience property for the Predicted Positive Rate (TP+FP/TP+FP+TN+FN)

property predictions(self)
property probabilities(self)
property thresholds(self)

Convenience property for the probability thresholds

property true_negative_rate(self)

Convenience property for the True Negative Rate (TN/FP+TN)

property true_positive_rate(self)

Convenience property for the True Positive Rate (TP/TP+FN)

static validate_labels(labels)
class simpleml.metrics.ClassificationMetric(dataset_split=None, **kwargs)[source]

Bases: simpleml.metrics.base_metric.Metric

TODO: Figure out multiclass generalizations

Parameters

dataset_split (Optional[str]) – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

_get_split(self, column)
Parameters

column (str) –

Return type

Any

property labels(self)
Return type

Any

property predictions(self)
Return type

Any

property probabilities(self)
Return type

Any

static validate_predictions(predictions)
Parameters

predictions (Any) –

Return type

None

class simpleml.metrics.F1ScoreMetric(**kwargs)[source]

Bases: AggregateBinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

static _score(predictions, labels)

Each aggregate needs to define a separate private method to actually calculate the aggregate

Separated from the public score method to enable easier testing and extension (values can be passed from non internal properties)

class simpleml.metrics.FprMetric(**kwargs)[source]

Bases: AggregateBinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

static _score(predictions, labels)

Each aggregate needs to define a separate private method to actually calculate the aggregate

Separated from the public score method to enable easier testing and extension (values can be passed from non internal properties)

class simpleml.metrics.FprTprMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.Metric(name=None, has_external_files=False, author=None, project=None, version_description=None, save_patterns=None, **kwargs)[source]

Bases: AbstractMetric

Base class for all Metric objects

model_id: foreign key to the model that was used to generate predictions

TODO: Should join criteria be composite of model and dataset for multiple

duplicate metric objects computed over different test datasets?

Parameters
  • name (Optional[str]) –

  • has_external_files (bool) –

  • author (Optional[str]) –

  • project (Optional[str]) –

  • version_description (Optional[str]) –

  • save_patterns (Optional[Dict[str, List[str]]]) –

__table_args__
__tablename__ = metrics
dataset
dataset_id
model
model_id
class simpleml.metrics.RocAucMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

static _score(probabilities, labels)
score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdAccuracyMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdF1ScoreMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdFdrMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdFnrMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdForMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdFprMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdInformednessMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdMarkednessMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdMccMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdNpvMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdPpvMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdTnrMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdTprMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.TprMetric(**kwargs)[source]

Bases: AggregateBinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters

dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

static _score(predictions, labels)

Each aggregate needs to define a separate private method to actually calculate the aggregate

Separated from the public score method to enable easier testing and extension (values can be passed from non internal properties)