`simpleml.models.classifiers.sklearn.naive_bayes`¶

Wrapper module around sklearn.naive_bayes

Module Contents¶

Classes¶

`SklearnBernoulliNB`	No different than base model. Here just to maintain the pattern
`SklearnGaussianNB`	No different than base model. Here just to maintain the pattern
`SklearnMultinomialNB`	No different than base model. Here just to maintain the pattern
`WrappedSklearnBernoulliNB`	Naive Bayes classifier for multivariate Bernoulli models.
`WrappedSklearnGaussianNB`	Gaussian Naive Bayes (GaussianNB)
`WrappedSklearnMultinomialNB`	Naive Bayes classifier for multinomial models

simpleml.models.classifiers.sklearn.naive_bayes.LOGGER[source]¶: Bernoulli

simpleml.models.classifiers.sklearn.naive_bayes.__author__ = Elisha Yadgaran[source]¶

class simpleml.models.classifiers.sklearn.naive_bayes.SklearnBernoulliNB(has_external_files=True, external_model_kwargs=None, params=None, **kwargs)[source]¶

Bases: simpleml.models.classifiers.sklearn.base_sklearn_classifier.SklearnClassifier

No different than base model. Here just to maintain the pattern Generic Base -> Library Base -> Domain Base -> Individual Models (ex: [Library]Model -> SklearnModel -> SklearnClassifier -> SklearnLogisticRegression)

Need to explicitly separate passthrough kwargs to external models since most do not support arbitrary **kwargs in the constructors

_create_external_model(self, **kwargs)[source]¶

Abstract method for each subclass to implement

should return the desired model object

class simpleml.models.classifiers.sklearn.naive_bayes.SklearnGaussianNB(has_external_files=True, external_model_kwargs=None, params=None, **kwargs)[source]¶

Bases: simpleml.models.classifiers.sklearn.base_sklearn_classifier.SklearnClassifier

No different than base model. Here just to maintain the pattern Generic Base -> Library Base -> Domain Base -> Individual Models (ex: [Library]Model -> SklearnModel -> SklearnClassifier -> SklearnLogisticRegression)

Need to explicitly separate passthrough kwargs to external models since most do not support arbitrary **kwargs in the constructors

_create_external_model(self, **kwargs)[source]¶

Abstract method for each subclass to implement

should return the desired model object

class simpleml.models.classifiers.sklearn.naive_bayes.SklearnMultinomialNB(has_external_files=True, external_model_kwargs=None, params=None, **kwargs)[source]¶

Bases: simpleml.models.classifiers.sklearn.base_sklearn_classifier.SklearnClassifier

No different than base model. Here just to maintain the pattern Generic Base -> Library Base -> Domain Base -> Individual Models (ex: [Library]Model -> SklearnModel -> SklearnClassifier -> SklearnLogisticRegression)

Need to explicitly separate passthrough kwargs to external models since most do not support arbitrary **kwargs in the constructors

_create_external_model(self, **kwargs)[source]¶

Abstract method for each subclass to implement

should return the desired model object

class simpleml.models.classifiers.sklearn.naive_bayes.WrappedSklearnBernoulliNB(*, alpha=1.0, binarize=0.0, fit_prior=True, class_prior=None)[source]¶

Bases: sklearn.naive_bayes.BernoulliNB, simpleml.models.classifiers.external_models.ClassificationExternalModelMixin

Naive Bayes classifier for multivariate Bernoulli models.

Like MultinomialNB, this classifier is suitable for discrete data. The difference is that while MultinomialNB works with occurrence counts, BernoulliNB is designed for binary/boolean features.

Read more in the User Guide.

alphafloat, default=1.0: Additive (Laplace/Lidstone) smoothing parameter (0 for no smoothing).
binarizefloat or None, default=0.0: Threshold for binarizing (mapping to booleans) of sample features. If None, input is presumed to already consist of binary vectors.
fit_priorbool, default=True: Whether to learn class prior probabilities or not. If false, a uniform prior will be used.
class_priorarray-like of shape (n_classes,), default=None: Prior probabilities of the classes. If specified the priors are not adjusted according to the data.

class_count_ndarray of shape (n_classes): Number of samples encountered for each class during fitting. This value is weighted by the sample weight when provided.
class_log_prior_ndarray of shape (n_classes): Log probability of each class (smoothed).
classesndarray of shape (n_classes,): Class labels known to the classifier
coef_ndarray of shape (n_classes, n_features): Mirrors feature_log_prob_ for interpreting BernoulliNB as a linear model.
feature_count_ndarray of shape (n_classes, n_features): Number of samples encountered for each (class, feature) during fitting. This value is weighted by the sample weight when provided.
feature_log_prob_ndarray of shape (n_classes, n_features): Empirical log probability of features given a class, P(x_i|y).
intercept_ndarray of shape (n_classes,): Mirrors class_log_prior_ for interpreting BernoulliNB as a linear model.
n_features_int: Number of features of each sample.

>>> import numpy as np
>>> rng = np.random.RandomState(1)
>>> X = rng.randint(5, size=(6, 100))
>>> Y = np.array([1, 2, 3, 4, 4, 5])
>>> from sklearn.naive_bayes import BernoulliNB
>>> clf = BernoulliNB()
>>> clf.fit(X, Y)
BernoulliNB()
>>> print(clf.predict(X[2:3]))
[3]

C.D. Manning, P. Raghavan and H. Schuetze (2008). Introduction to Information Retrieval. Cambridge University Press, pp. 234-265. https://nlp.stanford.edu/IR-book/html/htmledition/the-bernoulli-model-1.html

A. McCallum and K. Nigam (1998). A comparison of event models for naive Bayes text classification. Proc. AAAI/ICML-98 Workshop on Learning for Text Categorization, pp. 41-48.

V. Metsis, I. Androutsopoulos and G. Paliouras (2006). Spam filtering with naive Bayes – Which naive Bayes? 3rd Conf. on Email and Anti-Spam (CEAS).

get_feature_metadata(self, features, **kwargs)[source]¶: By default nothing is implemented

class simpleml.models.classifiers.sklearn.naive_bayes.WrappedSklearnGaussianNB(*, priors=None, var_smoothing=1e-09)[source]¶

Bases: sklearn.naive_bayes.GaussianNB, simpleml.models.classifiers.external_models.ClassificationExternalModelMixin

Gaussian Naive Bayes (GaussianNB)

Can perform online updates to model parameters via partial_fit(). For details on algorithm used to update feature means and variance online, see Stanford CS tech report STAN-CS-79-773 by Chan, Golub, and LeVeque:

http://i.stanford.edu/pub/cstr/reports/cs/tr/79/773/CS-TR-79-773.pdf

Read more in the User Guide.

alphafloat, default=1.0: Additive (Laplace/Lidstone) smoothing parameter (0 for no smoothing).
fit_priorbool, default=True: Whether to learn class prior probabilities or not. If false, a uniform prior will be used.
class_priorarray-like of shape (n_classes,), default=None: Prior probabilities of the classes. If specified the priors are not adjusted according to the data.

class_count_ndarray of shape (n_classes,): Number of samples encountered for each class during fitting. This value is weighted by the sample weight when provided.
class_log_prior_ndarray of shape (n_classes, ): Smoothed empirical log probability for each class.
classesndarray of shape (n_classes,): Class labels known to the classifier
coef_ndarray of shape (n_classes, n_features): Mirrors feature_log_prob_ for interpreting MultinomialNB as a linear model.

Deprecated since version 0.24: coef_ is deprecated in 0.24 and will be removed in 1.1 (renaming of 0.26).
feature_count_ndarray of shape (n_classes, n_features): Number of samples encountered for each (class, feature) during fitting. This value is weighted by the sample weight when provided.
feature_log_prob_ndarray of shape (n_classes, n_features): Empirical log probability of features given a class, P(x_i|y).
intercept_ndarray of shape (n_classes,): Mirrors class_log_prior_ for interpreting MultinomialNB as a linear model.

Deprecated since version 0.24: intercept_ is deprecated in 0.24 and will be removed in 1.1 (renaming of 0.26).
n_features_int: Number of features of each sample.

>>> import numpy as np
>>> rng = np.random.RandomState(1)
>>> X = rng.randint(5, size=(6, 100))
>>> y = np.array([1, 2, 3, 4, 5, 6])
>>> from sklearn.naive_bayes import MultinomialNB
>>> clf = MultinomialNB()
>>> clf.fit(X, y)
MultinomialNB()
>>> print(clf.predict(X[2:3]))
[3]

For the rationale behind the names coef_ and intercept_, i.e. naive Bayes as a linear classifier, see J. Rennie et al. (2003), Tackling the poor assumptions of naive Bayes text classifiers, ICML.

C.D. Manning, P. Raghavan and H. Schuetze (2008). Introduction to Information Retrieval. Cambridge University Press, pp. 234-265. https://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html

get_feature_metadata(self, features, **kwargs)[source]¶: By default nothing is implemented

simpleml.models.classifiers.sklearn.naive_bayes¶

Module Contents¶

Classes¶

`simpleml.models.classifiers.sklearn.naive_bayes`¶