Data Cleansing¶

One-stop data quality powerhouse – instantly clean dozens of common messy data issues: null handling, whitespace, unwanted characters, case standardization, and more. Perfect for preparing raw source data (CSV, Excel, APIs, logs) before analytics, modeling, or reporting.

Input¶

It accepts DataFrame as input from the previous Node

Output¶

This node output cleansed data

Type¶

transform

Class¶

fire.nodes.etl.NodeDataCleansing

Fields¶

Name	Title	Description
Columns	Columns
inputCols	Select Columns	Columns you want to clean. Leave empty to apply settings globally where it makes sense (e.g., trim whitespace on all string columns).
Remove Nulls	Remove Nulls
removeNullRows	Remove Null Rows	Drop entire rows that contain nulls in the selected columns (or any column if none selected). Great for strict data quality requirements.
removeNullColumns	Remove Null Columns	Drop entire columns that are completely null/empty. Useful after Union when some sources don’t have certain fields.
Replace Nulls	Replace Nulls
replaceWithBlanks	Replace Nulls → Blank (String fields)	Replace nulls in string columns with empty string ‘’ instead of literal ‘null’. Makes downstream joins and reports look clean.
replaceWithZero	Replace Nulls → 0 (Numeric fields)	Replace nulls in numeric columns with 0. Essential for aggregations (sum, avg) so nulls don’t skew results.
Remove Unwanted Characters	Remove Unwanted Characters
trimWhitespace	Trim Leading/Trailing Whitespace	Remove spaces before/after text (e.g., ‘ John ‘ → ‘John’). The #1 most common data issue!
removeTabsLineBreaks	Remove Tabs, Line Breaks & Duplicate Spaces	Clean up copy-paste mess: replaces t, , r and multiple spaces with single space.
allWhiteSpace	All Whitespace Characters	Remove every kind of whitespace (including non-breaking spaces).
letters	Letters (A-Z, a-z)	Strip all letters – useful for extracting numbers from mixed fields.
lettersExceptions	Letters Exceptions	Comma-separated letters to KEEP (e.g., ‘A,E,I,O,U’ to keep vowels).
numbers	Numbers (0-9)	Strip all digits – perfect for cleaning names that contain numbers.
numbersExceptions	Numbers Exceptions	Digits to KEEP (e.g., ‘123’ to preserve house numbers).
punctuation	Punctuation & Symbols	Remove !”#$%&’()*+,-./:;<=>?@[]^_`{\|}~ etc.
punctuationExceptions	Punctuation Exceptions	Symbols to KEEP (e.g., ‘.,-‘ for decimal numbers and names like O’Connor).
Modify Case	Modify Case
modifyCase	Modify Case	Standardize text case: • Upper case → JOHN DOE • Lower Case → john doe • Title Case → John Doe • Default → no change

Details¶

Data Cleansing Node – Fix 95% of Real-World Data Mess in One Click¶

The Data Cleansing node is the fastest way to turn dirty, inconsistent source data into clean, trusted, analysis-ready tables. Used by thousands of analysts daily to eliminate the most common (and frustrating) data quality issues instantly.

Real-World Problems It Solves Instantly¶

CSV/Excel files with extra spaces, tabs, line breaks
Nulls showing as blank, “null”, or actual null → breaking sums
Mixed case names/emails (JoHn.DoE@company.com)
Phone numbers with (123) 456-7890 → - → spaces
Product codes with hidden characters
Copied data from PDFs/websites with garbage symbols

Best Practice Combinations¶

Standard Clean Profile (most common):

Select all string columns
Trim Whitespace: true
Remove Tabs/Line Breaks: true
Replace Nulls → Blank: true
Title Case

Phone/Email Clean:

Remove Punctuation (except @ and .)
Trim + Lower Case

Numeric Clean:

Replace Nulls → 0
Remove Letters + Punctuation

Pro Tips¶

Run this node right after any Read/Union node
Combine with “Select” node after to drop/reorder
Use “Remove Null Columns” after Union Advanced to clean up schema drift

Examples¶

Data Cleansing – Before & After Real Examples¶

Example 1 – Typical Messy Customer Import¶

| Raw Data                          | After Standard Clean Profile |
|-----------------------------------|------------------------------|
| "  john DOE  "                    | "John Doe"                   |
| null                              | "" (blank)                   |
| "jane.smith@Company.com\\n"        | "Jane Smith" / "jane.smith@company.com" |
| "O'Connor, Patrick"               | "O'Connor, Patrick" (preserves ' and ,) |
| "123-456-7890  "                  | "1234567890"                 |

Example 2 – Financial Data with Nulls¶

| amount_raw | → Replace Null → 0 + Trim |
|------------|---------------------------|
| null       | 0                         |
| "  1,250.00   " | "1250.00"            |

Example 3 – Product Codes with Garbage¶

| raw_code                  | Remove Letters + Punctuation |
|---------------------------|------------------------------|
| "ABC-123!@#XYZ"           | "123"                        |
| "SKU_456  \t\\n"           | "456"                        |

Example 4 – Email Standardization¶

| raw_email                     | Lower Case + Trim + Remove Tabs |
|-------------------------------|---------------------------------|
| "  John.Doe@Company.COM  \\n"  | "john.doe@company.com"          |

Example 5 – Name Consistency for Matching¶

| raw_name              | Title Case + Trim + Remove Duplicate Spaces |
|-----------------------|---------------------------------------------|
| "  john   doe "       | "John Doe"                                  |
| "MARY-ANNE SMITH"     | "Mary-Anne Smith"                           |