Find And Replace Using Regex¶
This node finds and replaces text in a column with another
Input¶
It accepts a DataFrame as input from the previous Node
Type¶
transform
Class¶
fire.nodes.etl.NodeFindAndReplaceUsingRegex
Fields¶
Name |
Title |
Description |
|---|---|---|
inputCols |
Input Columns |
Columns on which to apply Regex |
searchPattern |
Find |
Enter Search Pattern |
replacePattern |
Replace |
Enter replacement Value |
Details¶
Find and Replace Details¶
This node allows the user to find and replace patterns of text within the data. This node will only search the columns selected in the Input Columns option.
The Find pattern must be in Regex format. This node will only find exact matches for the search pattern.
Examples¶
Find and Replace Examples¶
Incoming Dataframe has the following rows:
EMP_CD | EMP_NAME | DEPT | AGE | DATE_OF_JOINING | SALARY | PERFORMANCE
---------------------------------------------------------------------------------------------------------------------------
E01 | DAVID | HR | 25 | 2021-01-01 | 12 000.00 | GOOD
E02 | JOHN | SALES | 35 | 2019-05-04 | 11 000.00 | VERY GOOD
E03 | MARTIN | MARKETING | 40 | 2018-06-07 | 34 000 | AVERAGE
E04 | TONY | MARKETING | 45 | 2017-02-01 | 12 500.00 | VERY VERY GOOD
E05 | MARK | HR | 25 | 2020-12-21 | 78 999.00 | BAD
If FindAndReplaceUsingRegex node is configured to find and replace [-] character in [DATE_OF_JOINING] column with [/]¶
then outgoing Dataframe would be created as below after replacement of character:
EMP_CD | EMP_NAME | DEPT | AGE | DATE_OF_JOINING | SALARY | PERFORMANCE
---------------------------------------------------------------------------------------------------------------------------
E01 | DAVID | HR | 25 | 2021/01/01 | 12 000.00 | GOOD
E02 | JOHN | SALES | 35 | 2019/05/04 | 11 000.00 | VERY GOOD
E03 | MARTIN | MARKETING | 40 | 2018/06/07 | 34 000 | AVERAGE
E04 | TONY | MARKETING | 45 | 2017/02/01 | 12 500.00 | VERY VERY GOOD
E05 | MARK | HR | 25 | 2020/12/21 | 78 999.00 | BAD
If FindAndReplaceUsingRegex node is configured to find and replace [^VERY GOOD$] string in [PERFORMANCE] column with [EXCELLENT]¶
where [^] denotes start of string and [$] denotes end of string
then outgoing Dataframe would be created as below after replacement of exact entry of [VERY GOOD] with [EXCELLENT] and not other entries:
EMP_CD | EMP_NAME | DEPT | AGE | DATE_OF_JOINING | SALARY | PERFORMANCE
---------------------------------------------------------------------------------------------------------------------------
E01 | DAVID | HR | 25 | 2021-01-01 | 12 000.00 | GOOD
E02 | JOHN | SALES | 35 | 2019-05-04 | 11 000.00 | EXCELLENT
E03 | MARTIN | MARKETING | 40 | 2018-06-07 | 34 000 | AVERAGE
E04 | TONY | MARKETING | 45 | 2017-02-01 | 12 500.00 | VERY VERY GOOD
E05 | MARK | HR | 25 | 2020-12-21 | 78 999.00 | BAD
Note: [VERY VERY GOOD] is not replaced as it is not an exact match.