Sklearn K-Means¶
K-Means clustering algorithm using scikit-learn. K-Means falls in the general category of clustering algorithms, which partition observations into groups based on similarity without using labels.
Input¶
It takes in a DataFrame as input
Output¶
Outputs cluster centers, labels, and metrics such as inertia
Type¶
ml-estimator
Class¶
fire.nodes.sklearn.NodeSklearnKMeans
Fields¶
Name |
Title |
Description |
|---|---|---|
n_clusters |
Number of Clusters |
The number of clusters to form (Default). If ‘Estimate K’ is True, this value is overwritten. |
estimate_k |
Estimate K |
If True, the node will iterate through the ‘Search Range’ to find the best K automatically. |
k_search_range |
K Search Range |
Comma separated range (e.g., ‘2,15’) to test for optimal K. Only used if ‘Estimate K’ is True. |
optimization_metric |
Optimization Metric |
The metric used to determine the best K. Silhouette seeks the Max score; Inertia seeks the Elbow. |
featureCols |
Feature Columns |
Features to be used for clustering.Leaving this empty would use all columns. |
init |
Initialization Mode |
Method for initialization: ‘k-means++’ for smart initialization to speed up convergence, ‘random’ to choose n_clusters observations at random. |
n_init |
Number of Initializations |
Number of times the k-means algorithm will run with different centroid seeds. The best output is chosen based on inertia. |
max_iter |
Max Iterations |
Maximum number of iterations for a single run of the k-means algorithm. |
tol |
Tolerance |
Relative tolerance with regards to Frobenius norm of the difference in cluster centers to declare convergence. |
random_state |
Random State |
Seed for random number generation to ensure deterministic results. Leave empty for non-deterministic behavior. |
algorithm |
Algorithm |
K-means algorithm. |
saveCentroidsPath |
Save Centroids Path |
Save Centroids as CSV |