Data Cleansing =========== One-stop data quality powerhouse – instantly clean dozens of common messy data issues: null handling, whitespace, unwanted characters, case standardization, and more. Perfect for preparing raw source data (CSV, Excel, APIs, logs) before analytics, modeling, or reporting. Input -------------- It accepts DataFrame as input from the previous Node Output -------------- This node output cleansed data Type --------- transform Class --------- fire.nodes.etl.NodeDataCleansing Fields --------- .. list-table:: :widths: 10 5 10 :header-rows: 1 * - Name - Title - Description * - Columns - Columns - * - inputCols - Select Columns - Columns you want to clean. Leave empty to apply settings globally where it makes sense (e.g., trim whitespace on all string columns). * - Remove Nulls - Remove Nulls - * - removeNullRows - Remove Null Rows - Drop entire rows that contain nulls in the selected columns (or any column if none selected). Great for strict data quality requirements. * - removeNullColumns - Remove Null Columns - Drop entire columns that are completely null/empty. Useful after Union when some sources don’t have certain fields. * - Replace Nulls - Replace Nulls - * - replaceWithBlanks - Replace Nulls → Blank (String fields) - Replace nulls in string columns with empty string '' instead of literal 'null'. Makes downstream joins and reports look clean. * - replaceWithZero - Replace Nulls → 0 (Numeric fields) - Replace nulls in numeric columns with 0. Essential for aggregations (sum, avg) so nulls don’t skew results. * - Remove Unwanted Characters - Remove Unwanted Characters - * - trimWhitespace - Trim Leading/Trailing Whitespace - Remove spaces before/after text (e.g., ' John ' → 'John'). The #1 most common data issue! * - removeTabsLineBreaks - Remove Tabs, Line Breaks & Duplicate Spaces - Clean up copy-paste mess: replaces \t, , \r and multiple spaces with single space. * - allWhiteSpace - All Whitespace Characters - Remove every kind of whitespace (including non-breaking spaces). * - letters - Letters (A-Z, a-z) - Strip all letters – useful for extracting numbers from mixed fields. * - lettersExceptions - Letters Exceptions - Comma-separated letters to KEEP (e.g., 'A,E,I,O,U' to keep vowels). * - numbers - Numbers (0-9) - Strip all digits – perfect for cleaning names that contain numbers. * - numbersExceptions - Numbers Exceptions - Digits to KEEP (e.g., '123' to preserve house numbers). * - punctuation - Punctuation & Symbols - Remove !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ etc. * - punctuationExceptions - Punctuation Exceptions - Symbols to KEEP (e.g., '.,-' for decimal numbers and names like O'Connor). * - Modify Case - Modify Case - * - modifyCase - Modify Case - Standardize text case: • Upper case → JOHN DOE • Lower Case → john doe • Title Case → John Doe • Default → no change Details ------- Data Cleansing Node – Fix 95% of Real-World Data Mess in One Click +++++++++++++++ The Data Cleansing node is the fastest way to turn dirty, inconsistent source data into clean, trusted, analysis-ready tables. Used by thousands of analysts daily to eliminate the most common (and frustrating) data quality issues instantly. Real-World Problems It Solves Instantly +++++++++++++++ * CSV/Excel files with extra spaces, tabs, line breaks * Nulls showing as blank, “null”, or actual null → breaking sums * Mixed case names/emails (JoHn.DoE@company.com) * Phone numbers with (123) 456-7890 → - → spaces * Product codes with hidden characters * Copied data from PDFs/websites with garbage symbols Best Practice Combinations +++++++++++++++ 1. Standard Clean Profile (most common): * Select all string columns * Trim Whitespace: true * Remove Tabs/Line Breaks: true * Replace Nulls → Blank: true * Title Case 2. Phone/Email Clean: * Remove Punctuation (except @ and .) * Trim + Lower Case 3. Numeric Clean: * Replace Nulls → 0 * Remove Letters + Punctuation Pro Tips +++++++++++++++ * Run this node right after any Read/Union node * Combine with “Select” node after to drop/reorder * Use “Remove Null Columns” after Union Advanced to clean up schema drift Examples ------- Data Cleansing – Before & After Real Examples +++++++++++++++ Example 1 – Typical Messy Customer Import +++++++++++++++ :: | Raw Data | After Standard Clean Profile | |-----------------------------------|------------------------------| | " john DOE " | "John Doe" | | null | "" (blank) | | "jane.smith@Company.com\\n" | "Jane Smith" / "jane.smith@company.com" | | "O'Connor, Patrick" | "O'Connor, Patrick" (preserves ' and ,) | | "123-456-7890 " | "1234567890" | Example 2 – Financial Data with Nulls +++++++++++++++ :: | amount_raw | → Replace Null → 0 + Trim | |------------|---------------------------| | null | 0 | | " 1,250.00 " | "1250.00" | Example 3 – Product Codes with Garbage +++++++++++++++ :: | raw_code | Remove Letters + Punctuation | |---------------------------|------------------------------| | "ABC-123!@#XYZ" | "123" | | "SKU_456 \t\\n" | "456" | Example 4 – Email Standardization +++++++++++++++ :: | raw_email | Lower Case + Trim + Remove Tabs | |-------------------------------|---------------------------------| | " John.Doe@Company.COM \\n" | "john.doe@company.com" | Example 5 – Name Consistency for Matching +++++++++++++++ :: | raw_name | Title Case + Trim + Remove Duplicate Spaces | |-----------------------|---------------------------------------------| | " john doe " | "John Doe" | | "MARY-ANNE SMITH" | "Mary-Anne Smith" |