Vector Slicer¶
VectorSlicer feature selection, which takes a feature vector and outputs a new feature vector with a sub-array of the original features. It is useful for extracting features from a vector column
Type¶
ml-transformer
Class¶
fire.nodes.ml.NodeVectorSlicer
Fields¶
Name |
Title |
Description |
|---|---|---|
inputCol |
Features Column |
The features column name |
outputCol |
Output Column |
The output column name |
indices |
Indices |
comma seprated |
names |
Names |
The output column name |
Details¶
VectorSlicer is a transformer that takes a feature vector and outputs a new feature vector with a sub-array of the original features. It is useful for extracting features from a vector column.
VectorSlicer accepts a vector column with specified indices, then outputs a new vector column whose values are selected via those indices.
More details are available at : http://spark.apache.org/docs/latest/ml-features.html#vectorslicer
Examples¶
The below example is available at : http://spark.apache.org/docs/latest/ml-features.html#vectorslicer¶
import java.util.Arrays
import org.apache.spark.ml.attribute.{Attribute, AttributeGroup, NumericAttribute}
import org.apache.spark.ml.feature.VectorSlicer
import org.apache.spark.ml.linalg.Vectors
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.StructType
val data = Arrays.asList(Row(Vectors.dense(-2.0, 2.3, 0.0)))
val defaultAttr = NumericAttribute.defaultAttr
val attrs = Array(“f1”, “f2”, “f3”).map(defaultAttr.withName)
val attrGroup = new AttributeGroup(“userFeatures”, attrs.asInstanceOf[Array[Attribute]])
val dataset = spark.createDataFrame(data, StructType(Array(attrGroup.toStructField())))
val slicer = new VectorSlicer().setInputCol(“userFeatures”).setOutputCol(“features”)
slicer.setIndices(Array(1)).setNames(Array(“f3”))
// or slicer.setIndices(Array(1, 2)), or slicer.setNames(Array(“f2”, “f3”))
val output = slicer.transform(dataset)
println(output.select(“userFeatures”, “features”).first())