Data Cleansing Advanced

This node cleanses the selected columns from the dataset

Input

It accepts a DataFrame as input from the previous Node

Output

This node outputs cleansed data

Type

transform

Class

fire.nodes.etl.NodeDataCleansingAdvanced

Fields

Name

Title

Description

Remove Nulls

Remove Nulls

removeNullRows

Remove Null Rows

Removes rows with null values in any selected column; removes all rows if no columns are selected

nullRowsCols

Columns for Null Rows Check

Select columns to apply Remove Null Rows operation

removeNullColumns

Remove Null Columns

Removes columns with all null values; removes selected columns with at least one null if columns are specified

nullColsCols

Columns for Null Columns Check

Select columns to apply Remove Null Columns operation

Column Wise

Column-Level Cleansing

inputCols

Input Columns

Select columns to be processed for data cleansing

replaceWithBlanks

Replace Nulls With Blanks (String Fields)

Replaces null values in string fields with empty strings (‘’) for the corresponding selected column

replaceWithZero

Replace Nulls With 0 (Numeric Fields)

Replaces null values in numeric fields with 0 for the corresponding selected column

removeWhitespaces

Remove Whitespaces

Removes whitespace characters from the corresponding selected columns

removeLetters

Remove Letters

Removes alphabetic characters from the corresponding selected columns

removeDigits

Remove Digits

Removes numeric digits from the corresponding selected columns

removeSigns

Remove Special Signs

Removes special symbols or signs from the corresponding selected columns

removeCommas

Remove Commas

Removes commas from the corresponding selected columns

modifyCases

Modify Case

Converts text in the corresponding selected columns to upper, lower, or title case

Details

Data Cleansing Advanced – Enterprise-Grade Column-Level Cleaning

This is the professional version of Data Cleansing – giving you full per-column control instead of global rules. Used by data engineers building mission-critical, auditable, high-volume pipelines where every field has its own exact requirement.

When to Use Advanced vs Regular

  • Use Regular → 80% of cases (same rule for many columns)

  • Use Advanced → when you need different treatment per column (this node!)

Real Enterprise Scenarios

  • phone_number → remove everything except digits

  • email → lower case + trim

  • revenue → null → 0

  • customer_name → title case + trim

  • product_code → upper case + remove spaces

  • address_line → remove line breaks only

Pro Tips

  • Always place this node right after Read/Union

  • Combine with “Select” afterward to reorder/drop temp columns

  • Use the regular Data Cleansing node first for global fixes, then Advanced for exceptions

Examples

Data Cleansing Advanced – Real-World Column-Specific Rules

Example 1 – Standard Customer Master Cleanse

| Column          | Rules Applied                                      | Before                     | After                  |
|-----------------|----------------------------------------------------|----------------------------|------------------------|
| customer_id     | (none)                                             |                            |                        |
| full_name       | Title Case + Trim + Remove Duplicate Spaces       | "  john   DOE  "           | "John Doe"             |
| email           | Lower Case + Trim                                  | "  John.Doe@Company.COM\\n"| "john.doe@company.com" |
| phone           | Remove Letters + Remove Special Signs + Remove Whitespaces | "(555) 123-4567  "    | "5551234567"           |
| revenue_ytd     | Replace Null → 0                                   | null                       | 0                      |
| join_date       | (none)                                             |                            |                        |

Example 2 – Mixed Financial Data

| Column          | Rules                                              | Result                        |
|-----------------|----------------------------------------------------|-------------------------------|
| amount_usd      | Replace Null → 0 + Remove Commas                   | "1,250.00" → 1250.00          |
| currency        | Upper Case + Trim                                  | "usd" → "USD"                 |
| description     | Remove Tabs/Line Breaks only                       | "Line1\\nLine2" → "Line1 Line2"|

Example 3 – Product Catalog Standardization

| Column          | Rules                                              | Before                     | After                  |
|-----------------|----------------------------------------------------|----------------------------|------------------------|
| sku             | Upper Case + Remove Whitespaces + Remove Signs     | "abc-123 xyz"              | "ABC123"               |
| product_name    | Title Case + Trim                                  | "wireless mouse"           | "Wireless Mouse"       |
| price           | Replace Null → 0                                   | null                       | 0                      |

Example 4 – Remove Null Rows Only on Key Fields

  • nullRowsCols: customer_id, email

→ Any row missing either of these is dropped (keeps dirty rows that only miss optional fields)

Example 5 – Aggressive Garbage Cleanup

| Column          | Rules                                              |
|-----------------|----------------------------------------------------|
| legacy_id       | Remove Letters + Remove Signs → keep only digits  |
| notes           | Remove all Digits + Remove Special Signs          |