String Indexer¶
StringIndexer encodes a string column of labels to a column of label indices
Input¶
It takes in a DataFrame and transforms it to another DataFrame
Output¶
It adds a new column containing the encoding of the string column of labels to a column of label indices, to the incoming DataFrame.
Type¶
ml-transformer
Class¶
fire.nodes.etl.NodeStringIndexer
Fields¶
Name |
Title |
Description |
|---|---|---|
inputCol |
Input Columns |
Input column |
outputCol |
Output Column |
Output column |
Details¶
String Indexer Node Details¶
The String Indexer Node is used to encode a string column of labels to a column of label indices. It takes in a DataFrame and transforms it to another DataFrame by adding a new column containing the encoding of the string column of labels to a column of label indices.
It takes in the parameters handleInvalid, inputCol and outputCol, which are used to handle invalid entries, input column name and output column name respectively.
Input Parameters¶
HANDLE INVALID: Select whether to skip or throw error on invalid entries.
INPUT COLUMN: Select the required column for encoding.
OUTPUT COLUMN: The name of the output column after encoding.
Examples¶
String Indexer Node Example¶
Consider the below String Indexer output for the color column
id color encoded_color
0 red 2
1 green 1
2 blue 0
3 purple 3
In this example, the input column is color and the output column is encoded_color. The string indexer encodes the color column to a column of label indices. The handleInvalid is set to skip, so any invalid entries will be skipped.