Hive Incremental

This node is used to incrementally read data from Hive table.

Output

It creates a DataFrame from selected hive table with latest data

Type

dataset

Class

fire.nodes.etl.NodeHiveIncremental

Fields

Name

Title

Description

database

HIVE Database

HIVE Database

table

HIVE Table

HIVE Table from which to read the data

path

Watermark File Path

Path of the watermark file.

filterFields

Incremental Load Fields

Comma separated values of field names used in data filter for the incremental load.

outputColNames

Column Names of the database table

Column Names of the database table

outputColTypes

Column Types of the database table

Column Types of the database table

Details

Hive Incremental Node Details

This node reads a table from Hive and creates a DataFrame containing the schema and data of the specified table, with an incremental load configuration.

Parameters to be set:

  • OUTPUT STORAGE LEVEL: Keep this as DEFAULT.

  • HIVE DATABASE: Specify the Hive database from which data is to be read.

  • HIVE TABLE: Specify the table in the Hive database from which data is to be read incrementally.

  • WATERMARK FILE PATH: Define the file path for the watermark file to track the last load timestamp.

  • INCREMENTAL LOAD FIELDS: Specify the fields that will be used for incremental loading (e.g., timestamp or ID fields).

  • SCHEMA COLUMNS: Refresh the schema to load column names and types from the database table.

Examples

Hive Incremental Node Examples

Example of Connection Values

  • OUTPUT STORAGE LEVEL: DEFAULT

  • HIVE DATABASE: retail_db

  • HIVE TABLE: sales_data

  • WATERMARK FILE PATH: /user/hive/watermark/sales_data_watermark.txt

  • INCREMENTAL LOAD FIELDS: sale_date

  • SCHEMA COLUMNS: Click “Refresh Schema” to load columns from the specified Hive table.