`simpleml.metrics`

Import modules to register class names in global registry

Expose classes in one import module

Submodules

Package Contents

Classes

`AccuracyMetric`	TODO: Figure out multiclass generalizations
`BinaryClassificationMetric`	TODO: Figure out multiclass generalizations
`ClassificationMetric`	TODO: Figure out multiclass generalizations
`F1ScoreMetric`	TODO: Figure out multiclass generalizations
`FprMetric`	TODO: Figure out multiclass generalizations
`FprTprMetric`	TODO: Figure out multiclass generalizations
`Metric`	Base class for all Metric objects
`RocAucMetric`	TODO: Figure out multiclass generalizations
`ThresholdAccuracyMetric`	TODO: Figure out multiclass generalizations
`ThresholdF1ScoreMetric`	TODO: Figure out multiclass generalizations
`ThresholdFdrMetric`	TODO: Figure out multiclass generalizations
`ThresholdFnrMetric`	TODO: Figure out multiclass generalizations
`ThresholdForMetric`	TODO: Figure out multiclass generalizations
`ThresholdFprMetric`	TODO: Figure out multiclass generalizations
`ThresholdInformednessMetric`	TODO: Figure out multiclass generalizations
`ThresholdMarkednessMetric`	TODO: Figure out multiclass generalizations
`ThresholdMccMetric`	TODO: Figure out multiclass generalizations
`ThresholdNpvMetric`	TODO: Figure out multiclass generalizations
`ThresholdPpvMetric`	TODO: Figure out multiclass generalizations
`ThresholdTnrMetric`	TODO: Figure out multiclass generalizations
`ThresholdTprMetric`	TODO: Figure out multiclass generalizations
`TprMetric`	TODO: Figure out multiclass generalizations

Attributes

__author__

simpleml.metrics.__author__ = Elisha Yadgaran[source]

class simpleml.metrics.AccuracyMetric(**kwargs)[source]

Bases: AggregateBinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

static _score(predictions, labels)

Each aggregate needs to define a separate private method to actually calculate the aggregate

Separated from the public score method to enable easier testing and extension (values can be passed from non internal properties)

class simpleml.metrics.BinaryClassificationMetric(dataset_split=None, **kwargs)[source]

Bases: ClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split (Optional[str]) – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

static _create_confusion_matrix(thresholds, probabilities, labels): Independent computation method (easier testing)

property accuracy(self): Convenience property for the Accuracy Rate (TP+TN/TP+FP+TN+FN)

property confusion_matrix(self): Property method to return (or generate) dataframe of confusion matrix at each threshold

create_confusion_matrix(self): Iterate through each threshold and compute confusion matrix

static dedupe_curve(keys, values, maximize=True, round_places=3)

Method to deduplicate multiple values for the same key on a curve (ex multiple thresholds with the same fpr and different tpr for roc)

Parameters: maximize – Boolean, whether to choose the maximum value for each unique key or the minimum

property f1(self): Convenience property for the F1 Score (2*TP/2*TP+FP+FN)

property false_discovery_rate(self): Convenience property for the False Discovery Rate (FP/FP+TP)

property false_negative_rate(self): Convenience property for the False Negative Rate (FN/TP+FN)

property false_omission_rate(self): Convenience property for the False Omission Rate (FN/TN+FN)

property false_positive_rate(self): Convenience property for the False Positive Rate (FP/FP+TN)

property informedness(self): Convenience property for the Informedness (TPR+TNR-1)

property labels(self)

property markedness(self): Convenience property for the Markedness (PPV+NPV-1)

property matthews_correlation_coefficient(self): Convenience property for the Matthews Correlation Coefficient (TP*TN-FP*FN/((FP+TP)*(TP+FN)*(TN+FP)*(TN+FN))^0.5)

property negative_predictive_value(self): Convenience property for the Negative Predictive Value (TN/TN+FN)

property positive_predictive_value(self): Convenience property for the Positive Predictive Value (TP/FP+TP)

property predicted_negative_rate(self): Convenience property for the Predicted Negative Rate (TN+FN/TP+FP+TN+FN)

property predicted_positive_rate(self): Convenience property for the Predicted Positive Rate (TP+FP/TP+FP+TN+FN)

property predictions(self)

property probabilities(self)

property thresholds(self): Convenience property for the probability thresholds

property true_negative_rate(self): Convenience property for the True Negative Rate (TN/FP+TN)

property true_positive_rate(self): Convenience property for the True Positive Rate (TP/TP+FN)

static validate_labels(labels)

class simpleml.metrics.ClassificationMetric(dataset_split=None, **kwargs)[source]

Bases: simpleml.metrics.base_metric.Metric

TODO: Figure out multiclass generalizations

Parameters: dataset_split (Optional[str]) – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

_get_split(self, column)

Parameters: column (str) –
Return type: Any

property labels(self)

Return type: Any

property predictions(self)

Return type: Any

property probabilities(self)

Return type: Any

static validate_predictions(predictions)

Parameters: predictions (Any) –
Return type: None

class simpleml.metrics.F1ScoreMetric(**kwargs)[source]

Bases: AggregateBinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

static _score(predictions, labels)

Each aggregate needs to define a separate private method to actually calculate the aggregate

Separated from the public score method to enable easier testing and extension (values can be passed from non internal properties)

class simpleml.metrics.FprMetric(**kwargs)[source]

Bases: AggregateBinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

static _score(predictions, labels)

Each aggregate needs to define a separate private method to actually calculate the aggregate

Separated from the public score method to enable easier testing and extension (values can be passed from non internal properties)

class simpleml.metrics.FprTprMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.Metric(dataset_id=None, model_id=None, **kwargs)[source]

Bases: simpleml.persistables.base_persistable.Persistable

Base class for all Metric objects

Parameters

dataset_id (Optional[Union[str, uuid.uuid4]]) –
model_id (Optional[Union[str, uuid.uuid4]]) –

object_type :str = METRIC

_get_dataset_split(self, **kwargs)

Default accessor for dataset data. REFERS TO RAW DATASETS not the pipelines superimposed. That means that datasets that do not define explicit splits will have no notion of downstream splits (e.g. RandomSplitPipeline)

Return type: Any

_get_latest_version(self)

Versions should be autoincrementing for each object (constrained over friendly name and model). Executes a database lookup and increments..

Return type: int

_get_pipeline_split(self, column, split, **kwargs)

For special case where dataset is the same as the model’s dataset, the dataset splits can refer to the pipeline imposed splits, not the inherent dataset’s splits. Use the pipeline split then ex: RandomSplitPipeline on NoSplitDataset evaluating “in_sample” performance

Parameters

column (str) –
split (str) –

Return type

Any

_hash(self)

Hash is the combination of the:

Model
Dataset (optional)
Metric
Config

Return type: str

_load_dataset(self): Helper to fetch the dataset

_load_model(self): Helper to fetch the model

add_dataset(self, dataset)

Setter method for dataset used

Parameters: dataset (simpleml.datasets.base_dataset.Dataset) –
Return type: None

add_model(self, model)

Setter method for model used

Parameters: model (simpleml.models.base_model.Model) –
Return type: None

property dataset(self): Use a weakref to bind linked dataset so it doesnt bloat usage returns dataset if still available or tries to fetch otherwise

property model(self): Use a weakref to bind linked model so it doesnt bloat usage returns model if still available or tries to fetch otherwise

save(self, **kwargs)

Extend parent function with a few additional save routines

Return type: None

abstract score(self, **kwargs)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.RocAucMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

static _score(probabilities, labels)

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdAccuracyMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdF1ScoreMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdFdrMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdFnrMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdForMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdFprMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdInformednessMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdMarkednessMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdMccMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdNpvMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdPpvMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdTnrMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.ThresholdTprMetric(**kwargs)[source]

Bases: BinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

score(self)

Abstract method for each metric to define

Should set self.values

class simpleml.metrics.TprMetric(**kwargs)[source]

Bases: AggregateBinaryClassificationMetric

TODO: Figure out multiclass generalizations

Parameters: dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline

static _score(predictions, labels)

Each aggregate needs to define a separate private method to actually calculate the aggregate

Separated from the public score method to enable easier testing and extension (values can be passed from non internal properties)

simpleml.metrics

Submodules

Package Contents

Classes

Attributes

`simpleml.metrics`