Regex Advanced

Advanced regex operations for text processing - similar to Alteryx Regex Tool with auto-detection of capturing groups

Input

It accepts a DataFrame as input from the previous Node

Output

Returns a DataFrame with extracted patterns, marked matches, replaced text, or tokenized data based on the selected regex mode

Type

transform

Class

fire.nodes.etl.NodeRegexAdvanced

Fields

Name

Title

Description

general

General

inputCol

Input Column

Column to apply regex operations on

regexPattern

Regular Expression Pattern

Enter the regex pattern

regexMode

Regex Mode

Select the regex operation mode

caseSensitive

Case Sensitive

Enable case-sensitive pattern matching

errorHandling

Error Handling

How to handle errors: FAIL (stop execution), SKIP (remove row), IGNORE

replacementText

Replacement Text

Text to replace matched patterns with (REPLACE mode only)

tokenSplit

Split to Columns

Enter Number of columns to split

inputMatchCol

Match Column Name

Enter Column Name for Match Status

newColName

Target Column

Enter New Column Name

regularexpression

Expression

Regular Expression on how to get the data which has to be placed under this column

schema

InferSchema

outputColNames

Column Names of the Table

Output Columns Names of the Table

outputColTypes

Column Types of the Table

Output Column Types of the Table

outputColFormats

Column Formats

Output Column Formats

Details

Regex Advanced Node

Overview:

The Regex Advanced node provides powerful text processing capabilities using regular expressions, similar to the Alteryx Regex Tool. It allows parsing, tokenizing, matching, and replacing text in a DataFrame column. Users can also auto-detect capturing groups for parsing operations and control case-sensitivity and error handling.

Input:

  • Input Column: The column from the input DataFrame on which regex operations will be applied.

  • Regex Pattern: The regular expression pattern to extract, match, replace, or tokenize text.

  • Regex Mode: Select the operation mode:

  • PARSE – Extract data into new columns based on capturing groups.

  • TOKENIZE_COL – Split text into multiple columns.

  • TOKENIZE_ROW – Split text into multiple rows.

  • REPLACE – Replace matched patterns with specified text.

  • MATCH – Create a column marking whether the pattern matches.

  • Case Sensitivity: Specify whether pattern matching should be case-sensitive.

  • Error Handling: Choose how errors should be handled – FAIL, SKIP, or IGNORE.

Output:

Returns a transformed DataFrame with new columns or updated values based on the selected regex mode. Output may include:

  • Parsed columns from capturing groups (PARSE mode).

  • Tokenized columns or rows (TOKENIZE_COL/TOKENIZE_ROW).

  • Replaced text in the input column (REPLACE mode).

  • Match status column indicating success/failure (MATCH mode).

Advanced Options:

  • Replacement Text: Text to replace matches (REPLACE mode).

  • Split to Columns: Number of columns to split text into (TOKENIZE_COL mode).

  • Match Column Name: Name of the column storing match status (MATCH mode).

  • Target Column Names & Expressions (Parse tab): Map capturing groups to new column names with corresponding regex expressions.

  • Infer Schema (Schema tab): Define output column names, types, and formats.

Examples

Regex Advanced Node Examples

Example 1 – Parse Mode

  • Input DataFrame:*

| id | info                |
| -- | ------------------- |
| 1  | Name: John Age: 25  |
| 2  | Name: Alice Age: 30 |
| 3  | Name: Bob Age: 22   |
  • Node Configuration:*

  • Input Column: info

  • Regex Pattern: Name:s*(w+)s+Age:s*(d+)

  • Regex Mode: PARSE

  • Target Column Names: [“name”, “age”]

  • Expressions: [“(w+)”, “(d+)”]

  • Output DataFrame:*

| id | info                | name  | age |
| -- | ------------------- | ----- | --- |
| 1  | Name: John Age: 25  | John  | 25  |
| 2  | Name: Alice Age: 30 | Alice | 30  |
| 3  | Name: Bob Age: 22   | Bob   | 22  |

Example 2 – Replace Mode

  • Input DataFrame:*

| id | email                                                 |
| -- | ----------------------------------------------------- |
| 1  | [john.doe@gmail.com](mailto:john.doe@gmail.com)       |
| 2  | [alice.smith@yahoo.com](mailto:alice.smith@yahoo.com) |
  • Node Configuration:*

  • Input Column: email

  • Regex Pattern: @.*

  • Regex Mode: REPLACE

  • Replacement Text: @example.com

  • Output DataFrame:*

| id | email                                                     |
| -- | --------------------------------------------------------- |
| 1  | [john.doe@example.com](mailto:john.doe@example.com)       |
| 2  | [alice.smith@example.com](mailto:alice.smith@example.com) |

Example 3 – Match Mode

  • Input DataFrame:*

| id | code  |
| -- | ----- |
| 1  | AB123 |
| 2  | XY789 |
| 3  | 1234  |
  • Node Configuration:*

  • Input Column: code

  • Regex Pattern: ^[A-Z]{2}d{3}$

  • Regex Mode: MATCH

  • Match Column Name: is_valid

  • Output DataFrame:*

| id | code  | is_valid |
| -- | ----- | -------- |
| 1  | AB123 | true     |
| 2  | XY789 | true     |
| 3  | 1234  | false    |