Select

The ultimate column selector – pick, rename, cast, drop, and propagate columns with pixel-perfect control. Perfect for cleaning messy inputs, preparing clean datasets for BI tools, models, or downstream pipelines, and enforcing consistent schemas.

Input

It takes a DataFrame as input.

Output

DataFrame with selected, renamed, and optionally type-cast columns.

Type

transform

Class

fire.nodes.etl.NodeSelect

Fields

Name

Title

Description

General

General

inputCols

Columns

Columns you explicitly want to keep in the output. Drag to reorder – the order here becomes the final column order (super useful for reports!).

renameCols

Rename

New names for the selected columns. Leave blank to keep original name. Example: old_name → CustomerID, AmountUSD → Revenue.

colType

Change Data Type

Force cast each selected column to the correct type. Critical for fixing string → date, string → integer issues from CSV/Excel sources.

drop

Drop & Options

dropInputCols

Drop Columns

Columns you want to completely remove from the output. Great for PII, internal IDs, or junk fields.

inputColumnPropagation

Enable Input Column Propagation

When true → all columns NOT listed in ‘Columns’ and NOT in ‘Drop Columns’ are automatically passed through. Perfect when you only want to rename/cast a few columns and keep everything else untouched.

Details

Select Node – Your DataFrame Column Superpower

The Select node is the most frequently used transform in real-world pipelines. It does everything you expect from a modern “Select / Rename / Cast / Drop” tool — with drag-and-drop ordering, smart propagation, and zero surprises.

Why You’ll Use It Every Day

  • Clean up messy source files (CSV, Excel, JSON)

  • Enforce consistent column names & types across environments

  • Prepare perfect datasets for Tableau, Power BI, Looker

  • Remove PII before sharing or writing to data lakes

  • Reorder columns exactly how business wants them in reports

  • Fix common ingestion issues (dates stored as string, numbers as string)

Key Advantages

  • Visual drag-and-drop column ordering

  • One-click rename + type cast

  • Smart propagation (only touch what you need to change)

  • Explicit drop list for sensitive columns

  • Works exactly like Alteryx / dbt / Tableau Prep — but at Spark scale

Pro Tips

  • Always turn ON “Enable Input Column Propagation” when you only need to fix a few columns

  • Use it right after Read nodes to standardize incoming data

  • Combine with Schema Enforcement downstream for bulletproof pipelines

Examples

Select Node – Real-World Business Examples

Example 1 – Clean Raw CSV for Reporting

Raw Input (messy)

| customer_id_str | full_name         | order_total_str | order_date_str | junk_col | _c5 |
|-----------------|-------------------|-----------------|----------------|----------|-----|
| 1001            | John Doe          | 1250.50         | 2025-01-15     | temp     | xyz |

Configuration

  • Columns: customer_id_str, full_name, order_total_str, order_date_str

  • Rename: CustomerID, CustomerName, Revenue, OrderDate

  • Change Data Type: INTEGER, STRING, DOUBLE, DATE

  • Drop Columns: junk_col, _c5

  • Enable Input Column Propagation: false

Clean Output

| CustomerID | CustomerName | Revenue | OrderDate  |
|------------|--------------|---------|------------|
| 1001       | John Doe     | 1250.50 | 2025-01-15 |

Example 2 – Only Fix a Few Columns, Keep the Rest

You have 200 columns, but only need to rename 3 and cast 2 dates

Configuration

  • Columns: legacy_id, transaction_date_str, close_date_str

  • Rename: CustomerID, TransactionDate, CloseDate

  • Change Data Type: STRING, DATE, DATE

  • Enable Input Column Propagation: true

  • Drop Columns: temp_flag, debug_info

Result

All 197 untouched columns pass through automatically + your 3 fixed ones.

Example 3 – Prepare Perfect Tableau Extract

Goal: Exact column order & names expected by dashboard

Configuration

  • Columns: Region, Country, ProductLine, Revenue, Profit, OrderDate, CustomerSegment

  • Rename: (already perfect names)

  • Change Data Type: STRING, STRING, STRING, DOUBLE, DOUBLE, DATE, STRING

  • Enable Input Column Propagation: false

  • Drag to exact order required by viz

Result

Tableau connects instantly — no prep needed.

Example 4 – Remove PII Before Sharing

Configuration

  • Enable Input Column Propagation: true

  • Drop Columns: SSN, FullName, Email, Phone, Address, CreditCard

Result

All sensitive fields stripped, everything else preserved.

Example 5 – Reorder Columns for Excel Export

Business wants: Customer Name first, then ID, then everything else

Configuration

  • Columns: CustomerName, CustomerID → place at top

  • Enable Input Column Propagation: true

Result

Excel opens with the two key columns first — exactly as requested.