Gaussian Mixture =========== This class performs expectation maximization for multivariate Gaussian Mixture Models (GMMs). A GMM represents a composite distribution of independent Gaussian distributions with associated mixing weights specifying each's contribution to the composite. Input -------------- It takes in a DataFrame as input and performs GaussianMixture clustering Output -------------- The input DataFrame is passed along to the next Processors Type --------- ml-estimator Class --------- fire.nodes.ml.NodeGaussianMixture Fields --------- .. list-table:: :widths: 10 5 10 :header-rows: 1 * - Name - Title - Description * - featuresCol - Features Column - Features column of type vectorUDT for model fitting. * - k - K - The number of clusters to create. * - maxIter - Max Iterations - The maximum number of iterations. * - predictionCol - Prediction Column - The prediction column created during model scoring. * - seed - Seed - Random Seed. * - tol - Tolerence - The convergence tolerance for iterative algorithms. * - weightCol - Weight Column - Param for weight column name * - aggregationDepth - Aggregation Depth - Param for suggested depth for treeAggregate (>= 2) * - probabilityCol - Probability Column - Param for Column name for predicted class conditional probabilities. Details ------- A Gaussian Mixture Model represents a composite distribution whereby points are drawn from one of k Gaussian sub-distributions, each with its own probability. The spark.ml implementation uses the expectation-maximization algorithm to induce the maximum-likelihood model given a set of samples. GaussianMixture is implemented as an Estimator and generates a GaussianMixtureModel as the base model. More details are available at Apache Spark ML docs page: https://spark.apache.org/docs/latest/ml-clustering.html#gaussian-mixture-model-gmm Examples ------- Below example is available at : https://spark.apache.org/docs/latest/ml-clustering.html#gaussian-mixture-model-gmm import org.apache.spark.ml.clustering.GaussianMixture // Loads data val dataset = spark.read.format("libsvm").load("data/mllib/sample_kmeans_data.txt") // Trains Gaussian Mixture Model val gmm = new GaussianMixture() .setK(2) val model = gmm.fit(dataset) // output parameters of mixture model model for (i <- 0 until model.getK) { println(s"Gaussian $i:\\nweight=${model.weights(i)}\\n" + s"mu=${model.gaussians(i).mean}\\nsigma=\\n${model.gaussians(i).cov}\\n") }