N Gram Transformer¶

Converts the input array of strings into an array of n-grams. Null values in the input array are ignored. It returns an array of n-grams where each n-gram is represented by a space-separated string of words.When the input is empty, an empty array is returned. When the input array length is less than n (number of elements per n-gram), no n-grams are returned

Input¶

It takes in a DataFrame as input and transforms it to another DataFrame

Output¶

It adds a new column consisting of a sequence of nn-grams where each nn-gram is represented by a space-delimited string of nn consecutive words, to the incoming DataFrame

Type¶

ml-transformer

Class¶

fire.nodes.ml.NodeNGramTransformer

Fields¶

Name	Title	Description
inputCol	Input Column	Contains sequence of strings
inputColStringArrCol	List of Words	Sequence of words
outputCol	Output Column	Consist of a sequence of n-grams where each n-gram is represented by a space-delimited string of n consecutive words
numberOfGrams	Number of Grams	Sequence of ‘string array’ for integer ‘Number of Grams’

Details¶

N-Gram Transformer Node Details¶

This node converts the input string into an array of n-grams. Null values in the input array are ignored. It returns an array of n-grams where each n-gram is represented by a space-separated string of words. When the input is empty, an empty array is returned. When the input array length is less than n (number of elements per n-gram), no n-grams are returned”

Input Parameters¶

OUTPUT STORAGE LEVEL : Keep this as DEFAULT.
INPUT COLUMN : Select the input schema that contains the features to be converted.
LIST OF WORDS : Set a name of the field in the output schema that contains the tokenized values.
OUTPUT COLUMN : Set a name of the field in the output schema that contains the transformed features.
NUMBER OF GRAMS : Minimum n-gram length, greater than or equal to 1. Default is 2.

Examples¶

N-Gram Transformer Node Example¶

Assume that we have the following DataFrame with columns id and message:

    id |       message     |
   ----|-------------------|
    0  | this is a spam    |
    1  |i am going to work |
    2  |this is not a spam |


Applying the **N-Gram Transformer** node with **message** as the input column, **n-grams** with  value of 3, **tokenText** and **ngramText** as the output column, we should get the following:

 id |       message     |        tokenText        |               ngramText                 |
----|-------------------|-------------------------|-----------------------------------------|
 0  | this is a spam    |(this, is, a, spam)      | (this is a, is a spam)                  |
 1  |i am going to work |(i, am, going, to, work)    | (i am going, am going to, going to work)|
 2  |this is not a spam |(this, is, not, a, spam)    | (this is not, is not a, not a spam)     |