simpleml.models.classifiers.sklearn.naive_bayes
Wrapper module around sklearn.naive_bayes
Module Contents
Classes
No different than base model. Here just to maintain the pattern |
|
No different than base model. Here just to maintain the pattern |
|
No different than base model. Here just to maintain the pattern |
|
Naive Bayes classifier for multivariate Bernoulli models. |
|
Gaussian Naive Bayes (GaussianNB). |
|
Naive Bayes classifier for multinomial models. |
Attributes
Bernoulli |
|
- class simpleml.models.classifiers.sklearn.naive_bayes.SklearnBernoulliNB(has_external_files=True, external_model_kwargs=None, params=None, fitted=False, pipeline_id=None, **kwargs)[source]
Bases:
simpleml.models.classifiers.sklearn.base_sklearn_classifier.SklearnClassifier
No different than base model. Here just to maintain the pattern Generic Base -> Library Base -> Domain Base -> Individual Models (ex: [Library]Model -> SklearnModel -> SklearnClassifier -> SklearnLogisticRegression)
Need to explicitly separate passthrough kwargs to external models since most do not support arbitrary **kwargs in the constructors
Two supported patterns - full initialization in constructor or stepwise configured before fit and save
- Parameters
- class simpleml.models.classifiers.sklearn.naive_bayes.SklearnGaussianNB(has_external_files=True, external_model_kwargs=None, params=None, fitted=False, pipeline_id=None, **kwargs)[source]
Bases:
simpleml.models.classifiers.sklearn.base_sklearn_classifier.SklearnClassifier
No different than base model. Here just to maintain the pattern Generic Base -> Library Base -> Domain Base -> Individual Models (ex: [Library]Model -> SklearnModel -> SklearnClassifier -> SklearnLogisticRegression)
Need to explicitly separate passthrough kwargs to external models since most do not support arbitrary **kwargs in the constructors
Two supported patterns - full initialization in constructor or stepwise configured before fit and save
- Parameters
- class simpleml.models.classifiers.sklearn.naive_bayes.SklearnMultinomialNB(has_external_files=True, external_model_kwargs=None, params=None, fitted=False, pipeline_id=None, **kwargs)[source]
Bases:
simpleml.models.classifiers.sklearn.base_sklearn_classifier.SklearnClassifier
No different than base model. Here just to maintain the pattern Generic Base -> Library Base -> Domain Base -> Individual Models (ex: [Library]Model -> SklearnModel -> SklearnClassifier -> SklearnLogisticRegression)
Need to explicitly separate passthrough kwargs to external models since most do not support arbitrary **kwargs in the constructors
Two supported patterns - full initialization in constructor or stepwise configured before fit and save
- Parameters
- class simpleml.models.classifiers.sklearn.naive_bayes.WrappedSklearnBernoulliNB(*, alpha=1.0, binarize=0.0, fit_prior=True, class_prior=None)[source]
Bases:
sklearn.naive_bayes.BernoulliNB
,simpleml.models.classifiers.external_models.ClassificationExternalModelMixin
Naive Bayes classifier for multivariate Bernoulli models.
Like MultinomialNB, this classifier is suitable for discrete data. The difference is that while MultinomialNB works with occurrence counts, BernoulliNB is designed for binary/boolean features.
Read more in the User Guide.
- alphafloat, default=1.0
Additive (Laplace/Lidstone) smoothing parameter (0 for no smoothing).
- binarizefloat or None, default=0.0
Threshold for binarizing (mapping to booleans) of sample features. If None, input is presumed to already consist of binary vectors.
- fit_priorbool, default=True
Whether to learn class prior probabilities or not. If false, a uniform prior will be used.
- class_priorarray-like of shape (n_classes,), default=None
Prior probabilities of the classes. If specified the priors are not adjusted according to the data.
- class_count_ndarray of shape (n_classes,)
Number of samples encountered for each class during fitting. This value is weighted by the sample weight when provided.
- class_log_prior_ndarray of shape (n_classes,)
Log probability of each class (smoothed).
- classesndarray of shape (n_classes,)
Class labels known to the classifier
- coef_ndarray of shape (n_classes, n_features)
Mirrors
feature_log_prob_
for interpreting BernoulliNB as a linear model.- feature_count_ndarray of shape (n_classes, n_features)
Number of samples encountered for each (class, feature) during fitting. This value is weighted by the sample weight when provided.
- feature_log_prob_ndarray of shape (n_classes, n_features)
Empirical log probability of features given a class, P(x_i|y).
- intercept_ndarray of shape (n_classes,)
Mirrors
class_log_prior_
for interpreting BernoulliNB as a linear model.- n_features_int
Number of features of each sample.
Deprecated since version 1.0: Attribute n_features_ was deprecated in version 1.0 and will be removed in 1.2. Use n_features_in_ instead.
- n_features_in_int
Number of features seen during fit.
New in version 0.24.
- feature_names_in_ndarray of shape (n_features_in_,)
Names of features seen during fit. Defined only when X has feature names that are all strings.
New in version 1.0.
CategoricalNB : Naive Bayes classifier for categorical features. ComplementNB : The Complement Naive Bayes classifier
described in Rennie et al. (2003).
GaussianNB : Gaussian Naive Bayes (GaussianNB). MultinomialNB : Naive Bayes classifier for multinomial models.
C.D. Manning, P. Raghavan and H. Schuetze (2008). Introduction to Information Retrieval. Cambridge University Press, pp. 234-265. https://nlp.stanford.edu/IR-book/html/htmledition/the-bernoulli-model-1.html
A. McCallum and K. Nigam (1998). A comparison of event models for naive Bayes text classification. Proc. AAAI/ICML-98 Workshop on Learning for Text Categorization, pp. 41-48.
V. Metsis, I. Androutsopoulos and G. Paliouras (2006). Spam filtering with naive Bayes – Which naive Bayes? 3rd Conf. on Email and Anti-Spam (CEAS).
>>> import numpy as np >>> rng = np.random.RandomState(1) >>> X = rng.randint(5, size=(6, 100)) >>> Y = np.array([1, 2, 3, 4, 4, 5]) >>> from sklearn.naive_bayes import BernoulliNB >>> clf = BernoulliNB() >>> clf.fit(X, Y) BernoulliNB() >>> print(clf.predict(X[2:3])) [3]
- class simpleml.models.classifiers.sklearn.naive_bayes.WrappedSklearnGaussianNB(*, priors=None, var_smoothing=1e-09)[source]
Bases:
sklearn.naive_bayes.GaussianNB
,simpleml.models.classifiers.external_models.ClassificationExternalModelMixin
Gaussian Naive Bayes (GaussianNB).
Can perform online updates to model parameters via
partial_fit()
. For details on algorithm used to update feature means and variance online, see Stanford CS tech report STAN-CS-79-773 by Chan, Golub, and LeVeque:Read more in the User Guide.
- priorsarray-like of shape (n_classes,)
Prior probabilities of the classes. If specified the priors are not adjusted according to the data.
- var_smoothingfloat, default=1e-9
Portion of the largest variance of all features that is added to variances for calculation stability.
New in version 0.20.
- class_count_ndarray of shape (n_classes,)
number of training samples observed in each class.
- class_prior_ndarray of shape (n_classes,)
probability of each class.
- classesndarray of shape (n_classes,)
class labels known to the classifier.
- epsilon_float
absolute additive value to variances.
- n_features_in_int
Number of features seen during fit.
New in version 0.24.
- feature_names_in_ndarray of shape (n_features_in_,)
Names of features seen during fit. Defined only when X has feature names that are all strings.
New in version 1.0.
- sigma_ndarray of shape (n_classes, n_features)
Variance of each feature per class.
Deprecated since version 1.0: sigma_ is deprecated in 1.0 and will be removed in 1.2. Use var_ instead.
- var_ndarray of shape (n_classes, n_features)
Variance of each feature per class.
New in version 1.0.
- theta_ndarray of shape (n_classes, n_features)
mean of each feature per class.
BernoulliNB : Naive Bayes classifier for multivariate Bernoulli models. CategoricalNB : Naive Bayes classifier for categorical features. ComplementNB : Complement Naive Bayes classifier. MultinomialNB : Naive Bayes classifier for multinomial models.
>>> import numpy as np >>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) >>> Y = np.array([1, 1, 1, 2, 2, 2]) >>> from sklearn.naive_bayes import GaussianNB >>> clf = GaussianNB() >>> clf.fit(X, Y) GaussianNB() >>> print(clf.predict([[-0.8, -1]])) [1] >>> clf_pf = GaussianNB() >>> clf_pf.partial_fit(X, Y, np.unique(Y)) GaussianNB() >>> print(clf_pf.predict([[-0.8, -1]])) [1]
- class simpleml.models.classifiers.sklearn.naive_bayes.WrappedSklearnMultinomialNB(*, alpha=1.0, fit_prior=True, class_prior=None)[source]
Bases:
sklearn.naive_bayes.MultinomialNB
,simpleml.models.classifiers.external_models.ClassificationExternalModelMixin
Naive Bayes classifier for multinomial models.
The multinomial Naive Bayes classifier is suitable for classification with discrete features (e.g., word counts for text classification). The multinomial distribution normally requires integer feature counts. However, in practice, fractional counts such as tf-idf may also work.
Read more in the User Guide.
- alphafloat, default=1.0
Additive (Laplace/Lidstone) smoothing parameter (0 for no smoothing).
- fit_priorbool, default=True
Whether to learn class prior probabilities or not. If false, a uniform prior will be used.
- class_priorarray-like of shape (n_classes,), default=None
Prior probabilities of the classes. If specified the priors are not adjusted according to the data.
- class_count_ndarray of shape (n_classes,)
Number of samples encountered for each class during fitting. This value is weighted by the sample weight when provided.
- class_log_prior_ndarray of shape (n_classes,)
Smoothed empirical log probability for each class.
- classesndarray of shape (n_classes,)
Class labels known to the classifier
- coef_ndarray of shape (n_classes, n_features)
Mirrors
feature_log_prob_
for interpreting MultinomialNB as a linear model.Deprecated since version 0.24:
coef_
is deprecated in 0.24 and will be removed in 1.1 (renaming of 0.26).- feature_count_ndarray of shape (n_classes, n_features)
Number of samples encountered for each (class, feature) during fitting. This value is weighted by the sample weight when provided.
- feature_log_prob_ndarray of shape (n_classes, n_features)
Empirical log probability of features given a class,
P(x_i|y)
.- intercept_ndarray of shape (n_classes,)
Mirrors
class_log_prior_
for interpreting MultinomialNB as a linear model.Deprecated since version 0.24:
intercept_
is deprecated in 0.24 and will be removed in 1.1 (renaming of 0.26).- n_features_int
Number of features of each sample.
Deprecated since version 1.0: Attribute n_features_ was deprecated in version 1.0 and will be removed in 1.2. Use n_features_in_ instead.
- n_features_in_int
Number of features seen during fit.
New in version 0.24.
- feature_names_in_ndarray of shape (n_features_in_,)
Names of features seen during fit. Defined only when X has feature names that are all strings.
New in version 1.0.
BernoulliNB : Naive Bayes classifier for multivariate Bernoulli models. CategoricalNB : Naive Bayes classifier for categorical features. ComplementNB : Complement Naive Bayes classifier. GaussianNB : Gaussian Naive Bayes.
For the rationale behind the names coef_ and intercept_, i.e. naive Bayes as a linear classifier, see J. Rennie et al. (2003), Tackling the poor assumptions of naive Bayes text classifiers, ICML.
C.D. Manning, P. Raghavan and H. Schuetze (2008). Introduction to Information Retrieval. Cambridge University Press, pp. 234-265. https://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html
>>> import numpy as np >>> rng = np.random.RandomState(1) >>> X = rng.randint(5, size=(6, 100)) >>> y = np.array([1, 2, 3, 4, 5, 6]) >>> from sklearn.naive_bayes import MultinomialNB >>> clf = MultinomialNB() >>> clf.fit(X, y) MultinomialNB() >>> print(clf.predict(X[2:3])) [3]