DeltaMerge

Insert, delete and update data using the Delta merge command.

Type

transform

Class

fire.nodes.etl.NodeDeltaMerge

Fields

Name

Title

Description

targetLocation

TargetPath

Path of the target location

sourceAlias

Source Alias

Alias to use for the source DataFrame

targetAlias

Target Alias

Alias to use for existing target Delta table

mergeCondition

Merge Condition

Condition to merge data from source DataFrame to target table, which would be used to perform update, delete, or insert actions as specified.

whenMatchedUpdate

When Matched Update

whenMatchedUpdateAction

WhenMatched Update Action

Update the row from Source that already exists in Target based on Merge Condition.

whenMatchedUpdateCondition

Merge Condition

Optional additional condition for updating row. If specified then it must evaluate to true for the row to be updated.

whenMatchedUpdateTargetColumns

Target Columns

Target Column

whenMatchedUpdateExpressions

Expressions

Replace default update with expression

whenMatchedDelete

When Matched Delete

whenMatchedDeleteAction

WhenMatched Delete Action

Delete rows if Merge Condition and the optional additional condition evaluates to true.

whenMatchedDeleteCondition

WhenMatched Delete Condition

Optional additional condition for deleting row. If a condition is specified then it must evaluate to true for the row to be deleted.

whenNotMatched

When Not Matched

whenNotMatchedAction

WhenNot Matched Action

The action to perform if the row from Source is not present in Target based on Merge Condition

whenNotMatchedCondition

WhenNot Matched Condition

Optional condition for inserting row. If a condition is specified then it must evaluate to true for the row to be updated.

whenNotMatchedTargetColumns

Target Columns

Target Column

whenNotMatchedExpressions

Expressions

Replace default insert with expression

Details

Delta Merge Node Details

Insert, delete and update data using the Delta merge command.

Parameters to be set:

General Tab:

  • OUTPUT STORAGE LEVEL: Choose how the data should be stored (e.g., DEFAULT, DISK_ONLY).

  • TARGET PATH: Specify the directory where the Delta Lake table is located. Use the “Browse” button to navigate and select the location.

  • SOURCE ALIAS: Provide an alias for the source DataFrame (e.g., “source”). This alias will be used in the merge condition.

  • TARGET ALIAS: Provide an alias for the target Delta Lake table (e.g., “target”). This alias will be used in the merge condition.

  • MERGE CONDITION: Specify the condition used to match rows between the source DataFrame and the target Delta Lake table. This condition is a SQL expression that typically involves comparing columns from the source and target.

When Matched Update Tab:

  • Update Columns: Specify which columns in the target table should be updated based on matching conditions. Define the mappings between source and target columns.

When Matched Delete Tab:

  • Delete Condition: Define the condition for deleting records from the target table. Records satisfying this condition will be deleted.

When Not Matched Tab:

  • Insert Columns: Specify which columns should be inserted into the target table when no match is found. Define the mappings for the inserted values.

Examples

Delta Merge Node Examples

Example of Merging Data into a Delta Lake Table

General Tab:

  • OUTPUT STORAGE LEVEL: DEFAULT

  • TARGET PATH: /user/hive/warehouse/my_delta_table

  • SOURCE ALIAS: source

  • TARGET ALIAS: target

  • MERGE CONDITION: source.customer_id = target.customer_id

When Matched Update Tab:

  • Update Columns: target.name = source.name, target.age = source.age

When Matched Delete Tab:

  • Delete Condition: source.status = ‘inactive’

When Not Matched Tab:

  • Insert Columns: (customer_id, name, age, status)

  • Values to Insert: (source.customer_id, source.name, source.age, ‘new’)

This configuration would merge data from the source DataFrame (source) into the Delta Lake table located at /user/hive/warehouse/my_delta_table (target). Matching rows update name and age, inactive rows are deleted, and new rows are inserted with the specified columns and values.