Regex Advanced¶

Advanced regex operations for text processing - similar to Alteryx Regex Tool with auto-detection of capturing groups

Input¶

It accepts a DataFrame as input from the previous Node

Output¶

Returns a DataFrame with extracted patterns, marked matches, replaced text, or tokenized data based on the selected regex mode

Type¶

transform

Class¶

fire.nodes.etl.NodeRegexAdvanced

Fields¶

Name	Title	Description
general	General
inputCol	Input Column	Column to apply regex operations on
regexPattern	Regular Expression Pattern	Enter the regex pattern
regexMode	Regex Mode	Select the regex operation mode
caseSensitive	Case Sensitive	Enable case-sensitive pattern matching
errorHandling	Error Handling	How to handle errors: FAIL (stop execution), SKIP (remove row), IGNORE
replacementText	Replacement Text	Text to replace matched patterns with (REPLACE mode only)
tokenSplit	Split to Columns	Enter Number of columns to split
inputMatchCol	Match Column Name	Enter Column Name for Match Status
newColName	Target Column	Enter New Column Name
regularexpression	Expression	Regular Expression on how to get the data which has to be placed under this column
schema	InferSchema
outputColNames	Column Names of the Table	Output Columns Names of the Table
outputColTypes	Column Types of the Table	Output Column Types of the Table
outputColFormats	Column Formats	Output Column Formats

Details¶

Regex Advanced Node¶

Overview:¶

The Regex Advanced node provides powerful text processing capabilities using regular expressions, similar to the Alteryx Regex Tool. It allows parsing, tokenizing, matching, and replacing text in a DataFrame column. Users can also auto-detect capturing groups for parsing operations and control case-sensitivity and error handling.

Input:¶

Input Column: The column from the input DataFrame on which regex operations will be applied.
Regex Pattern: The regular expression pattern to extract, match, replace, or tokenize text.
Regex Mode: Select the operation mode:
PARSE – Extract data into new columns based on capturing groups.
TOKENIZE_COL – Split text into multiple columns.
TOKENIZE_ROW – Split text into multiple rows.
REPLACE – Replace matched patterns with specified text.
MATCH – Create a column marking whether the pattern matches.
Case Sensitivity: Specify whether pattern matching should be case-sensitive.
Error Handling: Choose how errors should be handled – FAIL, SKIP, or IGNORE.

Output:¶

Returns a transformed DataFrame with new columns or updated values based on the selected regex mode. Output may include:

Parsed columns from capturing groups (PARSE mode).
Tokenized columns or rows (TOKENIZE_COL/TOKENIZE_ROW).
Replaced text in the input column (REPLACE mode).
Match status column indicating success/failure (MATCH mode).

Advanced Options:¶

Replacement Text: Text to replace matches (REPLACE mode).
Split to Columns: Number of columns to split text into (TOKENIZE_COL mode).
Match Column Name: Name of the column storing match status (MATCH mode).
Target Column Names & Expressions (Parse tab): Map capturing groups to new column names with corresponding regex expressions.
Infer Schema (Schema tab): Define output column names, types, and formats.

Examples¶

Regex Advanced Node Examples¶

Example 1 – Parse Mode¶

Input DataFrame:*

| id | info                |
| -- | ------------------- |
| 1  | Name: John Age: 25  |
| 2  | Name: Alice Age: 30 |
| 3  | Name: Bob Age: 22   |

Node Configuration:*
Input Column: info
Regex Pattern: Name:s*(w+)s+Age:s*(d+)
Regex Mode: PARSE
Target Column Names: [“name”, “age”]
Expressions: [“(w+)”, “(d+)”]
Output DataFrame:*

| id | info                | name  | age |
| -- | ------------------- | ----- | --- |
| 1  | Name: John Age: 25  | John  | 25  |
| 2  | Name: Alice Age: 30 | Alice | 30  |
| 3  | Name: Bob Age: 22   | Bob   | 22  |

Example 2 – Replace Mode¶

Input DataFrame:*

| id | email                                                 |
| -- | ----------------------------------------------------- |
| 1  | [john.doe@gmail.com](mailto:john.doe@gmail.com)       |
| 2  | [alice.smith@yahoo.com](mailto:alice.smith@yahoo.com) |

Node Configuration:*
Input Column: email
Regex Pattern: @.*
Regex Mode: REPLACE
Replacement Text: @example.com
Output DataFrame:*

| id | email                                                     |
| -- | --------------------------------------------------------- |
| 1  | [john.doe@example.com](mailto:john.doe@example.com)       |
| 2  | [alice.smith@example.com](mailto:alice.smith@example.com) |

Example 3 – Match Mode¶

Input DataFrame:*

| id | code  |
| -- | ----- |
| 1  | AB123 |
| 2  | XY789 |
| 3  | 1234  |

Node Configuration:*
Input Column: code
Regex Pattern: ^[A-Z]{2}d{3}$
Regex Mode: MATCH
Match Column Name: is_valid
Output DataFrame:*

| id | code  | is_valid |
| -- | ----- | -------- |
| 1  | AB123 | true     |
| 2  | XY789 | true     |
| 3  | 1234  | false    |