`simpleml.metrics`

Import modules to register class names in global registry

Expose classes in one import module

Submodules

Package Contents

Classes

`AccuracyMetric`	TODO: Figure out multiclass generalizations
`BinaryClassificationMetric`	TODO: Figure out multiclass generalizations
`ClassificationMetric`	TODO: Figure out multiclass generalizations
`F1ScoreMetric`	TODO: Figure out multiclass generalizations
`FprMetric`	TODO: Figure out multiclass generalizations
`FprTprMetric`	TODO: Figure out multiclass generalizations
`Metric`	Base class for all Metric objects
`RocAucMetric`	TODO: Figure out multiclass generalizations
`ThresholdAccuracyMetric`	TODO: Figure out multiclass generalizations
`ThresholdF1ScoreMetric`	TODO: Figure out multiclass generalizations
`ThresholdFdrMetric`	TODO: Figure out multiclass generalizations
`ThresholdFnrMetric`	TODO: Figure out multiclass generalizations
`ThresholdForMetric`	TODO: Figure out multiclass generalizations
`ThresholdFprMetric`	TODO: Figure out multiclass generalizations
`ThresholdInformednessMetric`	TODO: Figure out multiclass generalizations
`ThresholdMarkednessMetric`	TODO: Figure out multiclass generalizations
`ThresholdMccMetric`	TODO: Figure out multiclass generalizations
`ThresholdNpvMetric`	TODO: Figure out multiclass generalizations
`ThresholdPpvMetric`	TODO: Figure out multiclass generalizations
`ThresholdTnrMetric`	TODO: Figure out multiclass generalizations
`ThresholdTprMetric`	TODO: Figure out multiclass generalizations
`TprMetric`	TODO: Figure out multiclass generalizations

Attributes

__author__

simpleml.metrics.__author__ = Elisha Yadgaran[source]

class simpleml.metrics.AccuracyMetric(**kwargs)[source]

Bases: AggregateBinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

static _score(predictions, labels)

Each aggregate needs to define a separate private method to actually calculate the aggregate

Separated from the public score method to enable easier testing and extension (values can be passed from non internal properties)

class simpleml.metrics.BinaryClassificationMetric(dataset_split=None, **kwargs)[source]

Bases: ClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split (Optional[str]) – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

static _create_confusion_matrix(thresholds, probabilities, labels): Independent computation method (easier testing)

property accuracy(self): Convenience property for the Accuracy Rate (TP+TN/TP+FP+TN+FN)

property confusion_matrix(self): Property method to return (or generate) dataframe of confusion matrix at each threshold

create_confusion_matrix(self): Iterate through each threshold and compute confusion matrix

static dedupe_curve(keys, values, maximize=True, round_places=3)

Method to deduplicate multiple values for the same key on a curve (ex multiple thresholds with the same fpr and different tpr for roc)

Parameters: maximize – Boolean, whether to choose the maximum value for each unique key or the minimum

property f1(self): Convenience property for the F1 Score (2*TP/2*TP+FP+FN)

property false_discovery_rate(self): Convenience property for the False Discovery Rate (FP/FP+TP)

property false_negative_rate(self): Convenience property for the False Negative Rate (FN/TP+FN)

property false_omission_rate(self): Convenience property for the False Omission Rate (FN/TN+FN)

property false_positive_rate(self): Convenience property for the False Positive Rate (FP/FP+TN)

property informedness(self): Convenience property for the Informedness (TPR+TNR-1)

property labels(self)

property markedness(self): Convenience property for the Markedness (PPV+NPV-1)

property matthews_correlation_coefficient(self): Convenience property for the Matthews Correlation Coefficient (TP*TN-FP*FN/((FP+TP)*(TP+FN)*(TN+FP)*(TN+FN))^0.5)

property negative_predictive_value(self): Convenience property for the Negative Predictive Value (TN/TN+FN)

property positive_predictive_value(self): Convenience property for the Positive Predictive Value (TP/FP+TP)

property predicted_negative_rate(self): Convenience property for the Predicted Negative Rate (TN+FN/TP+FP+TN+FN)

property predicted_positive_rate(self): Convenience property for the Predicted Positive Rate (TP+FP/TP+FP+TN+FN)

property predictions(self)

property probabilities(self)

property thresholds(self): Convenience property for the probability thresholds

property true_negative_rate(self): Convenience property for the True Negative Rate (TN/FP+TN)

property true_positive_rate(self): Convenience property for the True Positive Rate (TP/TP+FN)

static validate_labels(labels)

class simpleml.metrics.ClassificationMetric(dataset_split=None, **kwargs)[source]

Bases: simpleml.metrics.base_metric.Metric

TODO: Figure out multiclass generalizations

Parameters: dataset_split (Optional[str]) – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

_get_split(self, column)

Parameters: column (str) –
Return type: Any

property labels(self)

Return type: Any

property predictions(self)

Return type: Any

property probabilities(self)

Return type: Any

static validate_predictions(predictions)

Parameters: predictions (Any) –
Return type: None

class simpleml.metrics.F1ScoreMetric(**kwargs)[source]

Bases: AggregateBinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

static _score(predictions, labels)

Each aggregate needs to define a separate private method to actually calculate the aggregate

Separated from the public score method to enable easier testing and extension (values can be passed from non internal properties)

class simpleml.metrics.FprMetric(**kwargs)[source]

Bases: AggregateBinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

static _score(predictions, labels)

Each aggregate needs to define a separate private method to actually calculate the aggregate

Separated from the public score method to enable easier testing and extension (values can be passed from non internal properties)

class simpleml.metrics.FprTprMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.Metric(name=None, has_external_files=False, author=None, project=None, version_description=None, save_patterns=None, **kwargs)[source]

Bases: AbstractMetric

Base class for all Metric objects

model_id: foreign key to the model that was used to generate predictions

TODO: Should join criteria be composite of model and dataset for multiple: duplicate metric objects computed over different test datasets?

Parameters

name (Optional[str]) –
has_external_files (bool) –
author (Optional[str]) –
project (Optional[str]) –
version_description (Optional[str]) –
save_patterns (Optional[Dict[str, List[str]]]) –

__table_args__

__tablename__ = metrics

dataset

dataset_id

model

model_id

class simpleml.metrics.RocAucMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

static _score(probabilities, labels)

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdAccuracyMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdF1ScoreMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdFdrMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdFnrMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdForMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdFprMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdInformednessMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdMarkednessMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdMccMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdNpvMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdPpvMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdTnrMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdTprMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.TprMetric(**kwargs)[source]

Bases: AggregateBinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

static _score(predictions, labels)

Each aggregate needs to define a separate private method to actually calculate the aggregate

Separated from the public score method to enable easier testing and extension (values can be passed from non internal properties)

simpleml.metrics

Submodules

Package Contents

Classes

Attributes

`simpleml.metrics`