Stop Words Remover¶
Filters out stop words from input. Null values from input array are preserved unless adding null to stopWords explicitly.
Output¶
It adds a new column containing the sequence of strings from the input column but with the stop words removed, to the incoming DataFrame.
Type¶
ml-transformer
Class¶
fire.nodes.ml.NodeStopWordsRemover
Fields¶
Name |
Title |
Description |
|---|---|---|
inputCol |
Input Column |
Column containing the array text from which the stop words have to be removed |
outputCol |
Output Column |
Contains array of text by dropping list of stop words |
caseSensitive |
Case Sensitive |
Case Sensitive |
stopWords |
Comma Separated List of Custom Stop Words. If not provided, the default list of stop words would be used. |
Custom List of Stop Words |
Details¶
Stop Words Remover Node Details¶
Stop words are words which should be excluded from the input, typically because the words appear frequently and don’t carry as much meaning.
The Stop Words Remover node takes as input a sequence of strings (e.g. the output of a Tokenizer) and drops all the stop words from the input sequences.
Input Parameters¶
OUTPUT STORAGE LEVEL : Keep this as DEFAULT.
INPUT COLUMN : Select the array column type from where we will remove the stop words.
OUTPUT COLUMN : The name of the output column.
CASE SENSITIVE : Specifies Whether to do a case sensitive comparison over the stop words. (Default = False)
COMMA SEPARATED LIST OF CUSTOM STOP WORDS : Optional, specify comma separated stop words. Note that the default stopwords original list can be found from “Glasgow Information Retrieval Group” http://ir.dcs.gla.ac.uk/resources/linguistic_utils/stop_words
Examples¶
Stop Words Remover Node Example¶
Assume that we have the following DataFrame with columns id and raw:
id | raw
----|----------
0 | [I, saw, the, red, balloon]
1 | [Mary, had, a, little, lamb]
Applying the StopWordsRemover node with raw as the input column and filtered as the output column, we should get the following:
id | raw | filtered
----|-----------------------------|--------------------
0 | [I, saw, the, red, balloon] | [saw, red, balloon]
1 | [Mary, had, a, little, lamb]|[Mary, little, lamb]