simpleml.models.classifiers.sklearn.neighbors
Wrapper module around sklearn.neighbors
Module Contents
Classes
No different than base model. Here just to maintain the pattern |
|
Classifier implementing the k-nearest neighbors vote. |
Attributes
- class simpleml.models.classifiers.sklearn.neighbors.SklearnKNeighborsClassifier(has_external_files=True, external_model_kwargs=None, params=None, fitted=False, pipeline_id=None, **kwargs)[source]
Bases:
simpleml.models.classifiers.sklearn.base_sklearn_classifier.SklearnClassifier
No different than base model. Here just to maintain the pattern Generic Base -> Library Base -> Domain Base -> Individual Models (ex: [Library]Model -> SklearnModel -> SklearnClassifier -> SklearnLogisticRegression)
Need to explicitly separate passthrough kwargs to external models since most do not support arbitrary **kwargs in the constructors
Two supported patterns - full initialization in constructor or stepwise configured before fit and save
- Parameters
- class simpleml.models.classifiers.sklearn.neighbors.WrappedSklearnKNeighborsClassifier(n_neighbors=5, *, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None, n_jobs=None)[source]
Bases:
sklearn.neighbors.KNeighborsClassifier
,simpleml.models.classifiers.external_models.ClassificationExternalModelMixin
Classifier implementing the k-nearest neighbors vote.
Read more in the User Guide.
- n_neighborsint, default=5
Number of neighbors to use by default for
kneighbors()
queries.- weights{‘uniform’, ‘distance’} or callable, default=’uniform’
Weight function used in prediction. Possible values:
‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
[callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.
- algorithm{‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, default=’auto’
Algorithm used to compute the nearest neighbors:
‘ball_tree’ will use
BallTree
‘kd_tree’ will use
KDTree
‘brute’ will use a brute-force search.
‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to
fit()
method.
Note: fitting on sparse input will override the setting of this parameter, using brute force.
- leaf_sizeint, default=30
Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.
- pint, default=2
Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.
- metricstr or callable, default=’minkowski’
The distance metric to use for the tree. The default metric is minkowski, and with p=2 is equivalent to the standard Euclidean metric. For a list of available metrics, see the documentation of
DistanceMetric
. If metric is “precomputed”, X is assumed to be a distance matrix and must be square during fit. X may be a sparse graph, in which case only “nonzero” elements may be considered neighbors.- metric_paramsdict, default=None
Additional keyword arguments for the metric function.
- n_jobsint, default=None
The number of parallel jobs to run for neighbors search.
None
means 1 unless in ajoblib.parallel_backend
context.-1
means using all processors. See Glossary for more details. Doesn’t affectfit()
method.
- classesarray of shape (n_classes,)
Class labels known to the classifier
- effective_metric_str or callble
The distance metric used. It will be same as the metric parameter or a synonym of it, e.g. ‘euclidean’ if the metric parameter set to ‘minkowski’ and p parameter set to 2.
- effective_metric_params_dict
Additional keyword arguments for the metric function. For most metrics will be same with metric_params parameter, but may also contain the p parameter value if the effective_metric_ attribute is set to ‘minkowski’.
- n_features_in_int
Number of features seen during fit.
New in version 0.24.
- feature_names_in_ndarray of shape (n_features_in_,)
Names of features seen during fit. Defined only when X has feature names that are all strings.
New in version 1.0.
- n_samples_fit_int
Number of samples in the fitted data.
- outputs_2d_bool
False when y’s shape is (n_samples, ) or (n_samples, 1) during fit otherwise True.
RadiusNeighborsClassifier: Classifier based on neighbors within a fixed radius. KNeighborsRegressor: Regression based on k-nearest neighbors. RadiusNeighborsRegressor: Regression based on neighbors within a fixed radius. NearestNeighbors: Unsupervised learner for implementing neighbor searches.
See Nearest Neighbors in the online documentation for a discussion of the choice of
algorithm
andleaf_size
.Warning
Regarding the Nearest Neighbors algorithms, if it is found that two neighbors, neighbor k+1 and k, have identical distances but different labels, the results will depend on the ordering of the training data.
https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
>>> X = [[0], [1], [2], [3]] >>> y = [0, 0, 1, 1] >>> from sklearn.neighbors import KNeighborsClassifier >>> neigh = KNeighborsClassifier(n_neighbors=3) >>> neigh.fit(X, y) KNeighborsClassifier(...) >>> print(neigh.predict([[1.1]])) [0] >>> print(neigh.predict_proba([[0.9]])) [[0.666... 0.333...]]