Union Advanced

Smart Union node that combines multiple DataFrames with full control: union by column name or position, include all columns or only common ones, and automatically handle mismatched schemas with null padding. Perfect for merging monthly files, combining sources with evolving schemas, and building robust incremental pipelines.

Input

It accepts two or more DataFrames as input from the previous nodes.

Output

This node outputs a single DataFrame resulting from the union operation based on the selected configuration.

Type

join

Class

fire.nodes.etl.NodeUnionAdvanced

Fields

Name

Title

Description

allowMissingColumns

Allow Missing Columns

When true → DataFrames with different schemas are automatically aligned by filling missing columns with null values. When false → all input DataFrames must have exactly the same schema (fails otherwise). Recommended: true for real-world incremental loads.

configBy

Configure By

How columns are matched across DataFrames: • NAME → matches by column name (case-sensitive). Safest and most common choice. • POSITION → matches by column order (1st column of each DF lines up, 2nd column, etc.). Uses the column names from the first incoming DataFrame.

outputFields

Output Fields

Which columns appear in the final result: • ALL → includes every column that exists in any input (missing values become null). Best for full history and audit. • COMMON → keeps only columns that exist in ALL inputs. Great when you want a clean, consistent schema.

Details

Union Advanced Node – The Smart Way to Combine Data

The Union Advanced node is the most flexible and production-ready way to stack DataFrames vertically. Whether you’re appending daily files, merging regional exports, or combining sources that evolve over time — this node handles it gracefully without breaking your pipeline.

Real-World Use Cases

  • Monthly/weekly/daily incremental loads (new columns appear over time)

  • Combining sales data from different regions/countries with slightly different schemas

  • Merging legacy and new systems during migration

  • Building historical tables where schema evolves naturally

  • Preparing data for slowly changing dimensions (SCD)

Key Advantages Over Basic Union

  • Automatically handles missing or extra columns

  • Two matching strategies (by name or position)

  • Option to keep full history (ALL) or enforce strict schema (COMMON)

  • No manual column alignment needed

  • Works seamlessly with 2 to 100+ input branches

Best Practices

  • Always use configBy = NAME (unless you have a very specific positional reason)

  • Use outputFields = ALL + allowMissingColumns = true for audit-ready, future-proof pipelines

  • Use outputFields = COMMON when downstream models or reports require fixed schema

Examples

Union Advanced – Practical Business Examples

Example 1 – Monthly Sales Files (Schema Evolves Over Time)

Scenario

Jan file has columns: date, customer_id, amount

Feb file adds new column: promo_code

Mar file adds: channel

Configuration

  • Allow Missing Columns: true

  • Configure By: NAME

  • Output Fields: ALL

Result

| date       | customer_id | amount | promo_code | channel |
|------------|-------------|--------|------------|---------|
| 2025-01-01 | 101         | 150    | null       | null    |
| ...        | ...         | ...    | ...        | ...     |
| 2025-02-01 | 205         | 320    | SUMMER10   | null    |
| 2025-03-01 | 310         | 450    | SPRING25   | Online  |

Perfect for building a full historical sales table.

Example 2 – Regional Reports (Different Column Order & Extra Columns)

US Report: customer_id, name, state, revenue

EU Report: revenue, customer_name, country, customer_id

APAC Report: customer_id, revenue, region

Configuration

  • Allow Missing Columns: true

  • Configure By: NAME

  • Output Fields: ALL

Result

All regions stacked with consistent column order by name, missing fields filled with null.

Example 3 – Enforce Strict Schema (Data Quality Gate)

You only want columns that exist in every single daily file

Configuration

  • Allow Missing Columns: true

  • Configure By: NAME

  • Output Fields: COMMON

Result

Only columns present in 100% of inputs are kept — acts as automatic schema validation.

Example 4 – Legacy Positional Files (Old Systems)

Old COBOL extracts have fixed positions, no headers, columns always in same order

Configuration

  • Allow Missing Columns: false

  • Configure By: POSITION

  • Output Fields: ALL

Result

Columns aligned purely by position using the schema from the first input.

Example 5 – Combining 12 Monthly Branches into Yearly Table

12 separate pipelines (one per month) → all connect to one Union Advanced

Configuration

  • Allow Missing Columns: true

  • Configure By: NAME

  • Output Fields: ALL

Result

One clean yearly DataFrame with evolving schema preserved over time.