K-Means =========== K-means clustering with support for k-means initialization proposed by Bahmani et al Input -------------- It takes in a DataFrame as input and performs K-Means clustering Output -------------- The input DataFrame is passed along to the next Processors Type --------- ml-estimator Class --------- fire.nodes.ml.NodeKMeans Fields --------- .. list-table:: :widths: 10 5 10 :header-rows: 1 * - Name - Title - Description * - modelIdentifier - Model Identifier - modelIdentifier starts with $loop & columns names separated with underscore. Example: $loop_columnName1_columnName2. * - featuresCol - Features Column - Features column of type vectorUDT for model fitting. * - k - K - The number of clusters to create. * - maxIter - Max Iterations - The maximum number of iterations. * - predictionCol - Prediction Column - The prediction column created during model scoring. * - seed - Seed - Random Seed. * - tol - Tolerence - The convergence tolerance for iterative algorithms. * - initMode - initMode - The initialization algorithm mode. * - initSteps - initSteps - The number of steps for the k-means initialization mode. It will be ignored when other initialization modes are chosen. * - distanceMeasure - distanceMeasure - Trait for shared param distanceMeasure * - weightCol - Weight Column - Weight Column Details ------- :: k-means is one of the most commonly used clustering algorithms that clusters the data points into a predefined number of clusters. The MLlib implementation includes a parallelized variant of the k-means++ method called kmeans||. KMeans is implemented as an Estimator and generates a KMeansModel as the base model. More details are available at Apache Spark ML docs page: https://spark.apache.org/docs/latest/ml-clustering.html#k-means Examples ------- Below example is available at : https://spark.apache.org/docs/latest/ml-clustering.html#k-means import org.apache.spark.ml.clustering.KMeans import org.apache.spark.ml.evaluation.ClusteringEvaluator // Loads data. val dataset = spark.read.format("libsvm").load("data/mllib/sample_kmeans_data.txt") // Trains a k-means model. val kmeans = new KMeans().setK(2).setSeed(1L) val model = kmeans.fit(dataset) // Make predictions val predictions = model.transform(dataset) // Evaluate clustering by computing Silhouette score val evaluator = new ClusteringEvaluator() val silhouette = evaluator.evaluate(predictions) println(s"Silhouette with squared euclidean distance = $silhouette") // Shows the result. println("Cluster Centers: ") model.clusterCenters.foreach(println)