Data Cleansing

One-stop data quality powerhouse – instantly clean dozens of common messy data issues: null handling, whitespace, unwanted characters, case standardization, and more. Perfect for preparing raw source data (CSV, Excel, APIs, logs) before analytics, modeling, or reporting.

Input

It accepts DataFrame as input from the previous Node

Output

This node output cleansed data

Type

transform

Class

fire.nodes.etl.NodeDataCleansing

Fields

Name

Title

Description

Columns

Columns

inputCols

Select Columns

Columns you want to clean. Leave empty to apply settings globally where it makes sense (e.g., trim whitespace on all string columns).

Remove Nulls

Remove Nulls

removeNullRows

Remove Null Rows

Drop entire rows that contain nulls in the selected columns (or any column if none selected). Great for strict data quality requirements.

removeNullColumns

Remove Null Columns

Drop entire columns that are completely null/empty. Useful after Union when some sources don’t have certain fields.

Replace Nulls

Replace Nulls

replaceWithBlanks

Replace Nulls → Blank (String fields)

Replace nulls in string columns with empty string ‘’ instead of literal ‘null’. Makes downstream joins and reports look clean.

replaceWithZero

Replace Nulls → 0 (Numeric fields)

Replace nulls in numeric columns with 0. Essential for aggregations (sum, avg) so nulls don’t skew results.

Remove Unwanted Characters

Remove Unwanted Characters

trimWhitespace

Trim Leading/Trailing Whitespace

Remove spaces before/after text (e.g., ‘ John ‘ → ‘John’). The #1 most common data issue!

removeTabsLineBreaks

Remove Tabs, Line Breaks & Duplicate Spaces

Clean up copy-paste mess: replaces t, , r and multiple spaces with single space.

allWhiteSpace

All Whitespace Characters

Remove every kind of whitespace (including non-breaking spaces).

letters

Letters (A-Z, a-z)

Strip all letters – useful for extracting numbers from mixed fields.

lettersExceptions

Letters Exceptions

Comma-separated letters to KEEP (e.g., ‘A,E,I,O,U’ to keep vowels).

numbers

Numbers (0-9)

Strip all digits – perfect for cleaning names that contain numbers.

numbersExceptions

Numbers Exceptions

Digits to KEEP (e.g., ‘123’ to preserve house numbers).

punctuation

Punctuation & Symbols

Remove !”#$%&’()*+,-./:;<=>?@[]^_`{|}~ etc.

punctuationExceptions

Punctuation Exceptions

Symbols to KEEP (e.g., ‘.,-‘ for decimal numbers and names like O’Connor).

Modify Case

Modify Case

modifyCase

Modify Case

Standardize text case: • Upper case → JOHN DOE • Lower Case → john doe • Title Case → John Doe • Default → no change

Details

Data Cleansing Node – Fix 95% of Real-World Data Mess in One Click

The Data Cleansing node is the fastest way to turn dirty, inconsistent source data into clean, trusted, analysis-ready tables. Used by thousands of analysts daily to eliminate the most common (and frustrating) data quality issues instantly.

Real-World Problems It Solves Instantly

  • CSV/Excel files with extra spaces, tabs, line breaks

  • Nulls showing as blank, “null”, or actual null → breaking sums

  • Mixed case names/emails (JoHn.DoE@company.com)

  • Phone numbers with (123) 456-7890 → - → spaces

  • Product codes with hidden characters

  • Copied data from PDFs/websites with garbage symbols

Best Practice Combinations

  1. Standard Clean Profile (most common):

  • Select all string columns

  • Trim Whitespace: true

  • Remove Tabs/Line Breaks: true

  • Replace Nulls → Blank: true

  • Title Case

  1. Phone/Email Clean:

  • Remove Punctuation (except @ and .)

  • Trim + Lower Case

  1. Numeric Clean:

  • Replace Nulls → 0

  • Remove Letters + Punctuation

Pro Tips

  • Run this node right after any Read/Union node

  • Combine with “Select” node after to drop/reorder

  • Use “Remove Null Columns” after Union Advanced to clean up schema drift

Examples

Data Cleansing – Before & After Real Examples

Example 1 – Typical Messy Customer Import

| Raw Data                          | After Standard Clean Profile |
|-----------------------------------|------------------------------|
| "  john DOE  "                    | "John Doe"                   |
| null                              | "" (blank)                   |
| "jane.smith@Company.com\\n"        | "Jane Smith" / "jane.smith@company.com" |
| "O'Connor, Patrick"               | "O'Connor, Patrick" (preserves ' and ,) |
| "123-456-7890  "                  | "1234567890"                 |

Example 2 – Financial Data with Nulls

| amount_raw | → Replace Null → 0 + Trim |
|------------|---------------------------|
| null       | 0                         |
| "  1,250.00   " | "1250.00"            |

Example 3 – Product Codes with Garbage

| raw_code                  | Remove Letters + Punctuation |
|---------------------------|------------------------------|
| "ABC-123!@#XYZ"           | "123"                        |
| "SKU_456  \t\\n"           | "456"                        |

Example 4 – Email Standardization

| raw_email                     | Lower Case + Trim + Remove Tabs |
|-------------------------------|---------------------------------|
| "  John.Doe@Company.COM  \\n"  | "john.doe@company.com"          |

Example 5 – Name Consistency for Matching

| raw_name              | Title Case + Trim + Remove Duplicate Spaces |
|-----------------------|---------------------------------------------|
| "  john   doe "       | "John Doe"                                  |
| "MARY-ANNE SMITH"     | "Mary-Anne Smith"                           |