simpleml.metrics
Import modules to register class names in global registry
Expose classes in one import module
Submodules
Package Contents
Classes
TODO: Figure out multiclass generalizations |
|
TODO: Figure out multiclass generalizations |
|
TODO: Figure out multiclass generalizations |
|
TODO: Figure out multiclass generalizations |
|
TODO: Figure out multiclass generalizations |
|
TODO: Figure out multiclass generalizations |
|
Base class for all Metric objects |
|
TODO: Figure out multiclass generalizations |
|
TODO: Figure out multiclass generalizations |
|
TODO: Figure out multiclass generalizations |
|
TODO: Figure out multiclass generalizations |
|
TODO: Figure out multiclass generalizations |
|
TODO: Figure out multiclass generalizations |
|
TODO: Figure out multiclass generalizations |
|
TODO: Figure out multiclass generalizations |
|
TODO: Figure out multiclass generalizations |
|
TODO: Figure out multiclass generalizations |
|
TODO: Figure out multiclass generalizations |
|
TODO: Figure out multiclass generalizations |
|
TODO: Figure out multiclass generalizations |
|
TODO: Figure out multiclass generalizations |
|
TODO: Figure out multiclass generalizations |
Attributes
- class simpleml.metrics.AccuracyMetric(**kwargs)[source]
Bases:
AggregateBinaryClassificationMetric
TODO: Figure out multiclass generalizations
- Parameters
dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline
- static _score(predictions, labels)
Each aggregate needs to define a separate private method to actually calculate the aggregate
Separated from the public score method to enable easier testing and extension (values can be passed from non internal properties)
- class simpleml.metrics.BinaryClassificationMetric(dataset_split=None, **kwargs)[source]
Bases:
ClassificationMetric
TODO: Figure out multiclass generalizations
- Parameters
dataset_split (Optional[str]) – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline
- static _create_confusion_matrix(thresholds, probabilities, labels)
Independent computation method (easier testing)
- property accuracy(self)
Convenience property for the Accuracy Rate (TP+TN/TP+FP+TN+FN)
- property confusion_matrix(self)
Property method to return (or generate) dataframe of confusion matrix at each threshold
- create_confusion_matrix(self)
Iterate through each threshold and compute confusion matrix
- static dedupe_curve(keys, values, maximize=True, round_places=3)
Method to deduplicate multiple values for the same key on a curve (ex multiple thresholds with the same fpr and different tpr for roc)
- Parameters
maximize – Boolean, whether to choose the maximum value for each unique key or the minimum
- property f1(self)
Convenience property for the F1 Score (2*TP/2*TP+FP+FN)
- property false_discovery_rate(self)
Convenience property for the False Discovery Rate (FP/FP+TP)
- property false_negative_rate(self)
Convenience property for the False Negative Rate (FN/TP+FN)
- property false_omission_rate(self)
Convenience property for the False Omission Rate (FN/TN+FN)
- property false_positive_rate(self)
Convenience property for the False Positive Rate (FP/FP+TN)
- property informedness(self)
Convenience property for the Informedness (TPR+TNR-1)
- property labels(self)
- property markedness(self)
Convenience property for the Markedness (PPV+NPV-1)
- property matthews_correlation_coefficient(self)
Convenience property for the Matthews Correlation Coefficient (TP*TN-FP*FN/((FP+TP)*(TP+FN)*(TN+FP)*(TN+FN))^0.5)
- property negative_predictive_value(self)
Convenience property for the Negative Predictive Value (TN/TN+FN)
- property positive_predictive_value(self)
Convenience property for the Positive Predictive Value (TP/FP+TP)
- property predicted_negative_rate(self)
Convenience property for the Predicted Negative Rate (TN+FN/TP+FP+TN+FN)
- property predicted_positive_rate(self)
Convenience property for the Predicted Positive Rate (TP+FP/TP+FP+TN+FN)
- property predictions(self)
- property probabilities(self)
- property thresholds(self)
Convenience property for the probability thresholds
- property true_negative_rate(self)
Convenience property for the True Negative Rate (TN/FP+TN)
- property true_positive_rate(self)
Convenience property for the True Positive Rate (TP/TP+FN)
- static validate_labels(labels)
- class simpleml.metrics.ClassificationMetric(dataset_split=None, **kwargs)[source]
Bases:
simpleml.metrics.base_metric.Metric
TODO: Figure out multiclass generalizations
- Parameters
dataset_split (Optional[str]) – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline
- property labels(self)
- Return type
Any
- property predictions(self)
- Return type
Any
- property probabilities(self)
- Return type
Any
- static validate_predictions(predictions)
- Parameters
predictions (Any) –
- Return type
None
- class simpleml.metrics.F1ScoreMetric(**kwargs)[source]
Bases:
AggregateBinaryClassificationMetric
TODO: Figure out multiclass generalizations
- Parameters
dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline
- static _score(predictions, labels)
Each aggregate needs to define a separate private method to actually calculate the aggregate
Separated from the public score method to enable easier testing and extension (values can be passed from non internal properties)
- class simpleml.metrics.FprMetric(**kwargs)[source]
Bases:
AggregateBinaryClassificationMetric
TODO: Figure out multiclass generalizations
- Parameters
dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline
- static _score(predictions, labels)
Each aggregate needs to define a separate private method to actually calculate the aggregate
Separated from the public score method to enable easier testing and extension (values can be passed from non internal properties)
- class simpleml.metrics.FprTprMetric(**kwargs)[source]
Bases:
BinaryClassificationMetric
TODO: Figure out multiclass generalizations
- Parameters
dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline
- score(self)
Abstract method for each metric to define
Should set self.values
- class simpleml.metrics.Metric(dataset_id=None, model_id=None, **kwargs)[source]
Bases:
simpleml.persistables.base_persistable.Persistable
Base class for all Metric objects
- Parameters
- object_type :str = METRIC
- _get_dataset_split(self, **kwargs)
Default accessor for dataset data. REFERS TO RAW DATASETS not the pipelines superimposed. That means that datasets that do not define explicit splits will have no notion of downstream splits (e.g. RandomSplitPipeline)
- Return type
Any
- _get_latest_version(self)
Versions should be autoincrementing for each object (constrained over friendly name and model). Executes a database lookup and increments..
- Return type
- _get_pipeline_split(self, column, split, **kwargs)
For special case where dataset is the same as the model’s dataset, the dataset splits can refer to the pipeline imposed splits, not the inherent dataset’s splits. Use the pipeline split then ex: RandomSplitPipeline on NoSplitDataset evaluating “in_sample” performance
- _load_dataset(self)
Helper to fetch the dataset
- _load_model(self)
Helper to fetch the model
- add_dataset(self, dataset)
Setter method for dataset used
- Parameters
dataset (simpleml.datasets.base_dataset.Dataset) –
- Return type
None
- add_model(self, model)
Setter method for model used
- Parameters
model (simpleml.models.base_model.Model) –
- Return type
None
- property dataset(self)
Use a weakref to bind linked dataset so it doesnt bloat usage returns dataset if still available or tries to fetch otherwise
- property model(self)
Use a weakref to bind linked model so it doesnt bloat usage returns model if still available or tries to fetch otherwise
- save(self, **kwargs)
Extend parent function with a few additional save routines
- Return type
None
- abstract score(self, **kwargs)
Abstract method for each metric to define
Should set self.values
- class simpleml.metrics.RocAucMetric(**kwargs)[source]
Bases:
BinaryClassificationMetric
TODO: Figure out multiclass generalizations
- Parameters
dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline
- static _score(probabilities, labels)
- score(self)
Abstract method for each metric to define
Should set self.values
- class simpleml.metrics.ThresholdAccuracyMetric(**kwargs)[source]
Bases:
BinaryClassificationMetric
TODO: Figure out multiclass generalizations
- Parameters
dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline
- score(self)
Abstract method for each metric to define
Should set self.values
- class simpleml.metrics.ThresholdF1ScoreMetric(**kwargs)[source]
Bases:
BinaryClassificationMetric
TODO: Figure out multiclass generalizations
- Parameters
dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline
- score(self)
Abstract method for each metric to define
Should set self.values
- class simpleml.metrics.ThresholdFdrMetric(**kwargs)[source]
Bases:
BinaryClassificationMetric
TODO: Figure out multiclass generalizations
- Parameters
dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline
- score(self)
Abstract method for each metric to define
Should set self.values
- class simpleml.metrics.ThresholdFnrMetric(**kwargs)[source]
Bases:
BinaryClassificationMetric
TODO: Figure out multiclass generalizations
- Parameters
dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline
- score(self)
Abstract method for each metric to define
Should set self.values
- class simpleml.metrics.ThresholdForMetric(**kwargs)[source]
Bases:
BinaryClassificationMetric
TODO: Figure out multiclass generalizations
- Parameters
dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline
- score(self)
Abstract method for each metric to define
Should set self.values
- class simpleml.metrics.ThresholdFprMetric(**kwargs)[source]
Bases:
BinaryClassificationMetric
TODO: Figure out multiclass generalizations
- Parameters
dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline
- score(self)
Abstract method for each metric to define
Should set self.values
- class simpleml.metrics.ThresholdInformednessMetric(**kwargs)[source]
Bases:
BinaryClassificationMetric
TODO: Figure out multiclass generalizations
- Parameters
dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline
- score(self)
Abstract method for each metric to define
Should set self.values
- class simpleml.metrics.ThresholdMarkednessMetric(**kwargs)[source]
Bases:
BinaryClassificationMetric
TODO: Figure out multiclass generalizations
- Parameters
dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline
- score(self)
Abstract method for each metric to define
Should set self.values
- class simpleml.metrics.ThresholdMccMetric(**kwargs)[source]
Bases:
BinaryClassificationMetric
TODO: Figure out multiclass generalizations
- Parameters
dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline
- score(self)
Abstract method for each metric to define
Should set self.values
- class simpleml.metrics.ThresholdNpvMetric(**kwargs)[source]
Bases:
BinaryClassificationMetric
TODO: Figure out multiclass generalizations
- Parameters
dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline
- score(self)
Abstract method for each metric to define
Should set self.values
- class simpleml.metrics.ThresholdPpvMetric(**kwargs)[source]
Bases:
BinaryClassificationMetric
TODO: Figure out multiclass generalizations
- Parameters
dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline
- score(self)
Abstract method for each metric to define
Should set self.values
- class simpleml.metrics.ThresholdTnrMetric(**kwargs)[source]
Bases:
BinaryClassificationMetric
TODO: Figure out multiclass generalizations
- Parameters
dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline
- score(self)
Abstract method for each metric to define
Should set self.values
- class simpleml.metrics.ThresholdTprMetric(**kwargs)[source]
Bases:
BinaryClassificationMetric
TODO: Figure out multiclass generalizations
- Parameters
dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline
- score(self)
Abstract method for each metric to define
Should set self.values
- class simpleml.metrics.TprMetric(**kwargs)[source]
Bases:
AggregateBinaryClassificationMetric
TODO: Figure out multiclass generalizations
- Parameters
dataset_split – string denoting which dataset split to use can be one of: TRAIN, VALIDATION, Other. Other gets no prefix Default is train split to stay consistent with no split mapping to Train in Pipeline
- static _score(predictions, labels)
Each aggregate needs to define a separate private method to actually calculate the aggregate
Separated from the public score method to enable easier testing and extension (values can be passed from non internal properties)