simpleml.models.classifiers.sklearn.multiclass

Wrapper module around sklearn.multiclass

Module Contents

Classes

SklearnOneVsOneClassifier

No different than base model. Here just to maintain the pattern

SklearnOneVsRestClassifier

No different than base model. Here just to maintain the pattern

SklearnOutputCodeClassifier

No different than base model. Here just to maintain the pattern

WrappedSklearnOneVsOneClassifier

One-vs-one multiclass strategy.

WrappedSklearnOneVsRestClassifier

One-vs-the-rest (OvR) multiclass strategy.

WrappedSklearnOutputCodeClassifier

(Error-Correcting) Output-Code multiclass strategy.

Attributes

__author__

simpleml.models.classifiers.sklearn.multiclass.__author__ = Elisha Yadgaran[source]
class simpleml.models.classifiers.sklearn.multiclass.SklearnOneVsOneClassifier(has_external_files=True, external_model_kwargs=None, params=None, fitted=False, pipeline_id=None, **kwargs)[source]

Bases: simpleml.models.classifiers.sklearn.base_sklearn_classifier.SklearnClassifier

No different than base model. Here just to maintain the pattern Generic Base -> Library Base -> Domain Base -> Individual Models (ex: [Library]Model -> SklearnModel -> SklearnClassifier -> SklearnLogisticRegression)

Need to explicitly separate passthrough kwargs to external models since most do not support arbitrary **kwargs in the constructors

Two supported patterns - full initialization in constructor or stepwise configured before fit and save

Parameters
  • has_external_files (bool) –

  • external_model_kwargs (Optional[Dict[str, Any]]) –

  • params (Optional[Dict[str, Any]]) –

  • fitted (bool) –

  • pipeline_id (Optional[Union[str, uuid.uuid4]]) –

_create_external_model(self, **kwargs)[source]

Abstract method for each subclass to implement

should return the desired model object

class simpleml.models.classifiers.sklearn.multiclass.SklearnOneVsRestClassifier(has_external_files=True, external_model_kwargs=None, params=None, fitted=False, pipeline_id=None, **kwargs)[source]

Bases: simpleml.models.classifiers.sklearn.base_sklearn_classifier.SklearnClassifier

No different than base model. Here just to maintain the pattern Generic Base -> Library Base -> Domain Base -> Individual Models (ex: [Library]Model -> SklearnModel -> SklearnClassifier -> SklearnLogisticRegression)

Need to explicitly separate passthrough kwargs to external models since most do not support arbitrary **kwargs in the constructors

Two supported patterns - full initialization in constructor or stepwise configured before fit and save

Parameters
  • has_external_files (bool) –

  • external_model_kwargs (Optional[Dict[str, Any]]) –

  • params (Optional[Dict[str, Any]]) –

  • fitted (bool) –

  • pipeline_id (Optional[Union[str, uuid.uuid4]]) –

_create_external_model(self, **kwargs)[source]

Abstract method for each subclass to implement

should return the desired model object

class simpleml.models.classifiers.sklearn.multiclass.SklearnOutputCodeClassifier(has_external_files=True, external_model_kwargs=None, params=None, fitted=False, pipeline_id=None, **kwargs)[source]

Bases: simpleml.models.classifiers.sklearn.base_sklearn_classifier.SklearnClassifier

No different than base model. Here just to maintain the pattern Generic Base -> Library Base -> Domain Base -> Individual Models (ex: [Library]Model -> SklearnModel -> SklearnClassifier -> SklearnLogisticRegression)

Need to explicitly separate passthrough kwargs to external models since most do not support arbitrary **kwargs in the constructors

Two supported patterns - full initialization in constructor or stepwise configured before fit and save

Parameters
  • has_external_files (bool) –

  • external_model_kwargs (Optional[Dict[str, Any]]) –

  • params (Optional[Dict[str, Any]]) –

  • fitted (bool) –

  • pipeline_id (Optional[Union[str, uuid.uuid4]]) –

_create_external_model(self, **kwargs)[source]

Abstract method for each subclass to implement

should return the desired model object

class simpleml.models.classifiers.sklearn.multiclass.WrappedSklearnOneVsOneClassifier(estimator, *, n_jobs=None)[source]

Bases: sklearn.multiclass.OneVsOneClassifier, simpleml.models.classifiers.external_models.ClassificationExternalModelMixin

One-vs-one multiclass strategy.

This strategy consists in fitting one classifier per class pair. At prediction time, the class which received the most votes is selected. Since it requires to fit n_classes * (n_classes - 1) / 2 classifiers, this method is usually slower than one-vs-the-rest, due to its O(n_classes^2) complexity. However, this method may be advantageous for algorithms such as kernel algorithms which don’t scale well with n_samples. This is because each individual learning problem only involves a small subset of the data whereas, with one-vs-the-rest, the complete dataset is used n_classes times.

Read more in the User Guide.

estimatorestimator object

An estimator object implementing fit and one of decision_function or predict_proba.

n_jobsint, default=None

The number of jobs to use for the computation: the n_classes * ( n_classes - 1) / 2 OVO problems are computed in parallel.

None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

estimators_list of n_classes * (n_classes - 1) / 2 estimators

Estimators used for predictions.

classesnumpy array of shape [n_classes]

Array containing labels.

n_classes_int

Number of classes.

pairwise_indices_list, length = len(estimators_), or None

Indices of samples used when training the estimators. None when estimator’s pairwise tag is False.

Deprecated since version 0.24: The _pairwise attribute is deprecated in 0.24. From 1.1 (renaming of 0.25) and onward, pairwise_indices_ will use the pairwise estimator tag instead.

n_features_in_int

Number of features seen during fit.

New in version 0.24.

feature_names_in_ndarray of shape (n_features_in_,)

Names of features seen during fit. Defined only when X has feature names that are all strings.

New in version 1.0.

OneVsRestClassifier : One-vs-all multiclass strategy.

>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.multiclass import OneVsOneClassifier
>>> from sklearn.svm import LinearSVC
>>> X, y = load_iris(return_X_y=True)
>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, test_size=0.33, shuffle=True, random_state=0)
>>> clf = OneVsOneClassifier(
...     LinearSVC(random_state=0)).fit(X_train, y_train)
>>> clf.predict(X_test[:10])
array([2, 1, 0, 2, 0, 2, 0, 1, 1, 1])
class simpleml.models.classifiers.sklearn.multiclass.WrappedSklearnOneVsRestClassifier(estimator, *, n_jobs=None)[source]

Bases: sklearn.multiclass.OneVsRestClassifier, simpleml.models.classifiers.external_models.ClassificationExternalModelMixin

One-vs-the-rest (OvR) multiclass strategy.

Also known as one-vs-all, this strategy consists in fitting one classifier per class. For each classifier, the class is fitted against all the other classes. In addition to its computational efficiency (only n_classes classifiers are needed), one advantage of this approach is its interpretability. Since each class is represented by one and one classifier only, it is possible to gain knowledge about the class by inspecting its corresponding classifier. This is the most commonly used strategy for multiclass classification and is a fair default choice.

OneVsRestClassifier can also be used for multilabel classification. To use this feature, provide an indicator matrix for the target y when calling .fit. In other words, the target labels should be formatted as a 2D binary (0/1) matrix, where [i, j] == 1 indicates the presence of label j in sample i. This estimator uses the binary relevance method to perform multilabel classification, which involves training one binary classifier independently for each label.

Read more in the User Guide.

estimatorestimator object

An estimator object implementing fit and one of decision_function or predict_proba.

n_jobsint, default=None

The number of jobs to use for the computation: the n_classes one-vs-rest problems are computed in parallel.

None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

Changed in version v0.20: n_jobs default changed from 1 to None

estimators_list of n_classes estimators

Estimators used for predictions.

coef_ndarray of shape (1, n_features) or (n_classes, n_features)

Coefficient of the features in the decision function. This attribute exists only if the estimators_ defines coef_.

Deprecated since version 0.24: This attribute is deprecated in 0.24 and will be removed in 1.1 (renaming of 0.26). If you use this attribute in RFE or SelectFromModel, you may pass a callable to the importance_getter parameter that extracts feature the importances from estimators_.

intercept_ndarray of shape (1, 1) or (n_classes, 1)

If y is binary, the shape is (1, 1) else (n_classes, 1) This attribute exists only if the estimators_ defines intercept_.

Deprecated since version 0.24: This attribute is deprecated in 0.24 and will be removed in 1.1 (renaming of 0.26). If you use this attribute in RFE or SelectFromModel, you may pass a callable to the importance_getter parameter that extracts feature the importances from estimators_.

classesarray, shape = [n_classes]

Class labels.

n_classes_int

Number of classes.

label_binarizer_LabelBinarizer object

Object used to transform multiclass labels to binary labels and vice-versa.

multilabel_boolean

Whether a OneVsRestClassifier is a multilabel classifier.

n_features_in_int

Number of features seen during fit. Only defined if the underlying estimator exposes such an attribute when fit.

New in version 0.24.

feature_names_in_ndarray of shape (n_features_in_,)

Names of features seen during fit. Only defined if the underlying estimator exposes such an attribute when fit.

New in version 1.0.

MultiOutputClassifierAlternate way of extending an estimator for

multilabel classification.

sklearn.preprocessing.MultiLabelBinarizerTransform iterable of iterables

to binary indicator matrix.

>>> import numpy as np
>>> from sklearn.multiclass import OneVsRestClassifier
>>> from sklearn.svm import SVC
>>> X = np.array([
...     [10, 10],
...     [8, 10],
...     [-5, 5.5],
...     [-5.4, 5.5],
...     [-20, -20],
...     [-15, -20]
... ])
>>> y = np.array([0, 0, 1, 1, 2, 2])
>>> clf = OneVsRestClassifier(SVC()).fit(X, y)
>>> clf.predict([[-19, -20], [9, 9], [-5, 5]])
array([2, 0, 1])
class simpleml.models.classifiers.sklearn.multiclass.WrappedSklearnOutputCodeClassifier(estimator, *, code_size=1.5, random_state=None, n_jobs=None)[source]

Bases: sklearn.multiclass.OutputCodeClassifier, simpleml.models.classifiers.external_models.ClassificationExternalModelMixin

(Error-Correcting) Output-Code multiclass strategy.

Output-code based strategies consist in representing each class with a binary code (an array of 0s and 1s). At fitting time, one binary classifier per bit in the code book is fitted. At prediction time, the classifiers are used to project new points in the class space and the class closest to the points is chosen. The main advantage of these strategies is that the number of classifiers used can be controlled by the user, either for compressing the model (0 < code_size < 1) or for making the model more robust to errors (code_size > 1). See the documentation for more details.

Read more in the User Guide.

estimatorestimator object

An estimator object implementing fit and one of decision_function or predict_proba.

code_sizefloat, default=1.5

Percentage of the number of classes to be used to create the code book. A number between 0 and 1 will require fewer classifiers than one-vs-the-rest. A number greater than 1 will require more classifiers than one-vs-the-rest.

random_stateint, RandomState instance, default=None

The generator used to initialize the codebook. Pass an int for reproducible output across multiple function calls. See Glossary.

n_jobsint, default=None

The number of jobs to use for the computation: the multiclass problems are computed in parallel.

None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

estimators_list of int(n_classes * code_size) estimators

Estimators used for predictions.

classesndarray of shape (n_classes,)

Array containing labels.

code_book_ndarray of shape (n_classes, code_size)

Binary array containing the code of each class.

n_features_in_int

Number of features seen during fit. Only defined if the underlying estimator exposes such an attribute when fit.

New in version 0.24.

feature_names_in_ndarray of shape (n_features_in_,)

Names of features seen during fit. Only defined if the underlying estimator exposes such an attribute when fit.

New in version 1.0.

OneVsRestClassifier : One-vs-all multiclass strategy. OneVsOneClassifier : One-vs-one multiclass strategy.

1

“Solving multiclass learning problems via error-correcting output codes”, Dietterich T., Bakiri G., Journal of Artificial Intelligence Research 2, 1995.

2

“The error coding method and PICTs”, James G., Hastie T., Journal of Computational and Graphical statistics 7, 1998.

3

“The Elements of Statistical Learning”, Hastie T., Tibshirani R., Friedman J., page 606 (second-edition) 2008.

>>> from sklearn.multiclass import OutputCodeClassifier
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.datasets import make_classification
>>> X, y = make_classification(n_samples=100, n_features=4,
...                            n_informative=2, n_redundant=0,
...                            random_state=0, shuffle=False)
>>> clf = OutputCodeClassifier(
...     estimator=RandomForestClassifier(random_state=0),
...     random_state=0).fit(X, y)
>>> clf.predict([[0, 0, 0, 0]])
array([1])