Sklearn K-Means¶

K-Means clustering algorithm using scikit-learn. K-Means falls in the general category of clustering algorithms, which partition observations into groups based on similarity without using labels.

Input¶

It takes in a DataFrame as input

Output¶

Outputs cluster centers, labels, and metrics such as inertia

Type¶

ml-estimator

Class¶

fire.nodes.sklearn.NodeSklearnKMeans

Fields¶

Name	Title	Description
n_clusters	Number of Clusters	The number of clusters to form (Default). If ‘Estimate K’ is True, this value is overwritten.
estimate_k	Estimate K	If True, the node will iterate through the ‘Search Range’ to find the best K automatically.
k_search_range	K Search Range	Comma separated range (e.g., ‘2,15’) to test for optimal K. Only used if ‘Estimate K’ is True.
optimization_metric	Optimization Metric	The metric used to determine the best K. Silhouette seeks the Max score; Inertia seeks the Elbow.
featureCols	Feature Columns	Features to be used for clustering.Leaving this empty would use all columns.
init	Initialization Mode	Method for initialization: ‘k-means++’ for smart initialization to speed up convergence, ‘random’ to choose n_clusters observations at random.
n_init	Number of Initializations	Number of times the k-means algorithm will run with different centroid seeds. The best output is chosen based on inertia.
max_iter	Max Iterations	Maximum number of iterations for a single run of the k-means algorithm.
tol	Tolerance	Relative tolerance with regards to Frobenius norm of the difference in cluster centers to declare convergence.
random_state	Random State	Seed for random number generation to ensure deterministic results. Leave empty for non-deterministic behavior.
algorithm	Algorithm	K-means algorithm.
saveCentroidsPath	Save Centroids Path	Save Centroids as CSV