Regex Advanced¶
Advanced regex operations for text processing - similar to Alteryx Regex Tool with auto-detection of capturing groups
Input¶
It accepts a DataFrame as input from the previous Node
Output¶
Returns a DataFrame with extracted patterns, marked matches, replaced text, or tokenized data based on the selected regex mode
Type¶
transform
Class¶
fire.nodes.etl.NodeRegexAdvanced
Fields¶
Name |
Title |
Description |
|---|---|---|
general |
General |
|
inputCol |
Input Column |
Column to apply regex operations on |
regexPattern |
Regular Expression Pattern |
Enter the regex pattern |
regexMode |
Regex Mode |
Select the regex operation mode |
caseSensitive |
Case Sensitive |
Enable case-sensitive pattern matching |
errorHandling |
Error Handling |
How to handle errors: FAIL (stop execution), SKIP (remove row), IGNORE |
replacementText |
Replacement Text |
Text to replace matched patterns with (REPLACE mode only) |
tokenSplit |
Split to Columns |
Enter Number of columns to split |
inputMatchCol |
Match Column Name |
Enter Column Name for Match Status |
newColName |
Target Column |
Enter New Column Name |
regularexpression |
Expression |
Regular Expression on how to get the data which has to be placed under this column |
schema |
InferSchema |
|
outputColNames |
Column Names of the Table |
Output Columns Names of the Table |
outputColTypes |
Column Types of the Table |
Output Column Types of the Table |
outputColFormats |
Column Formats |
Output Column Formats |
Details¶
Regex Advanced Node¶
Overview:¶
The Regex Advanced node provides powerful text processing capabilities using regular expressions, similar to the Alteryx Regex Tool. It allows parsing, tokenizing, matching, and replacing text in a DataFrame column. Users can also auto-detect capturing groups for parsing operations and control case-sensitivity and error handling.
Input:¶
Input Column: The column from the input DataFrame on which regex operations will be applied.
Regex Pattern: The regular expression pattern to extract, match, replace, or tokenize text.
Regex Mode: Select the operation mode:
PARSE – Extract data into new columns based on capturing groups.
TOKENIZE_COL – Split text into multiple columns.
TOKENIZE_ROW – Split text into multiple rows.
REPLACE – Replace matched patterns with specified text.
MATCH – Create a column marking whether the pattern matches.
Case Sensitivity: Specify whether pattern matching should be case-sensitive.
Error Handling: Choose how errors should be handled – FAIL, SKIP, or IGNORE.
Output:¶
Returns a transformed DataFrame with new columns or updated values based on the selected regex mode. Output may include:
Parsed columns from capturing groups (PARSE mode).
Tokenized columns or rows (TOKENIZE_COL/TOKENIZE_ROW).
Replaced text in the input column (REPLACE mode).
Match status column indicating success/failure (MATCH mode).
Advanced Options:¶
Replacement Text: Text to replace matches (REPLACE mode).
Split to Columns: Number of columns to split text into (TOKENIZE_COL mode).
Match Column Name: Name of the column storing match status (MATCH mode).
Target Column Names & Expressions (Parse tab): Map capturing groups to new column names with corresponding regex expressions.
Infer Schema (Schema tab): Define output column names, types, and formats.
Examples¶
Regex Advanced Node Examples¶
Example 1 – Parse Mode¶
Input DataFrame:*
| id | info |
| -- | ------------------- |
| 1 | Name: John Age: 25 |
| 2 | Name: Alice Age: 30 |
| 3 | Name: Bob Age: 22 |
Node Configuration:*
Input Column: info
Regex Pattern: Name:s*(w+)s+Age:s*(d+)
Regex Mode: PARSE
Target Column Names: [“name”, “age”]
Expressions: [“(w+)”, “(d+)”]
Output DataFrame:*
| id | info | name | age |
| -- | ------------------- | ----- | --- |
| 1 | Name: John Age: 25 | John | 25 |
| 2 | Name: Alice Age: 30 | Alice | 30 |
| 3 | Name: Bob Age: 22 | Bob | 22 |
Example 2 – Replace Mode¶
Input DataFrame:*
| id | email |
| -- | ----------------------------------------------------- |
| 1 | [john.doe@gmail.com](mailto:john.doe@gmail.com) |
| 2 | [alice.smith@yahoo.com](mailto:alice.smith@yahoo.com) |
Node Configuration:*
Input Column: email
Regex Pattern: @.*
Regex Mode: REPLACE
Replacement Text: @example.com
Output DataFrame:*
| id | email |
| -- | --------------------------------------------------------- |
| 1 | [john.doe@example.com](mailto:john.doe@example.com) |
| 2 | [alice.smith@example.com](mailto:alice.smith@example.com) |
Example 3 – Match Mode¶
Input DataFrame:*
| id | code |
| -- | ----- |
| 1 | AB123 |
| 2 | XY789 |
| 3 | 1234 |
Node Configuration:*
Input Column: code
Regex Pattern: ^[A-Z]{2}d{3}$
Regex Mode: MATCH
Match Column Name: is_valid
Output DataFrame:*
| id | code | is_valid |
| -- | ----- | -------- |
| 1 | AB123 | true |
| 2 | XY789 | true |
| 3 | 1234 | false |