N Gram Transformer¶
Converts the input array of strings into an array of n-grams. Null values in the input array are ignored. It returns an array of n-grams where each n-gram is represented by a space-separated string of words.When the input is empty, an empty array is returned. When the input array length is less than n (number of elements per n-gram), no n-grams are returned
Input¶
It takes in a DataFrame as input and transforms it to another DataFrame
Output¶
It adds a new column consisting of a sequence of nn-grams where each nn-gram is represented by a space-delimited string of nn consecutive words, to the incoming DataFrame
Type¶
ml-transformer
Class¶
fire.nodes.ml.NodeNGramTransformer
Fields¶
Name |
Title |
Description |
|---|---|---|
inputCol |
Input Column |
Contains sequence of strings |
inputColStringArrCol |
List of Words |
Sequence of words |
outputCol |
Output Column |
Consist of a sequence of n-grams where each n-gram is represented by a space-delimited string of n consecutive words |
numberOfGrams |
Number of Grams |
Sequence of ‘string array’ for integer ‘Number of Grams’ |
Details¶
N-Gram Transformer Node Details¶
This node converts the input string into an array of n-grams. Null values in the input array are ignored. It returns an array of n-grams where each n-gram is represented by a space-separated string of words. When the input is empty, an empty array is returned. When the input array length is less than n (number of elements per n-gram), no n-grams are returned”
Input Parameters¶
OUTPUT STORAGE LEVEL : Keep this as DEFAULT.
INPUT COLUMN : Select the input schema that contains the features to be converted.
LIST OF WORDS : Set a name of the field in the output schema that contains the tokenized values.
OUTPUT COLUMN : Set a name of the field in the output schema that contains the transformed features.
NUMBER OF GRAMS : Minimum n-gram length, greater than or equal to 1. Default is 2.
Examples¶
N-Gram Transformer Node Example¶
Assume that we have the following DataFrame with columns id and message:
id | message |
----|-------------------|
0 | this is a spam |
1 |i am going to work |
2 |this is not a spam |
Applying the **N-Gram Transformer** node with **message** as the input column, **n-grams** with value of 3, **tokenText** and **ngramText** as the output column, we should get the following:
id | message | tokenText | ngramText |
----|-------------------|-------------------------|-----------------------------------------|
0 | this is a spam |(this, is, a, spam) | (this is a, is a spam) |
1 |i am going to work |(i, am, going, to, work) | (i am going, am going to, going to work)|
2 |this is not a spam |(this, is, not, a, spam) | (this is not, is not a, not a spam) |