Save As HIVE Table¶
Saves the DataFrame into an Apache HIVE Table
Type¶
transform
Class¶
fire.nodes.save.NodeSaveAsTable
Fields¶
Name |
Title |
Description |
|---|---|---|
database |
HIVE Database |
Name of the HIVE Database |
table |
HIVE Table |
Name of the HIVE table |
format |
Format |
File format when saving to HIVE Table |
saveMode |
Save Mode |
Whether to Append, Overwrite or Error if the path Exists |
advanced |
Advanced |
|
partitionBy |
Partition By |
List of columns to partition by - separated by space |
numBuckets |
NumBuckets |
Number of buckets |
bucketBy |
Bucket By |
List of columns to bucket by - separated by space |
Details¶
Save As HIVE Table Node Details¶
Saves the DataFrame into an Apache HIVE Table.
Parameters to be set:¶
General:
OUTPUT STORAGE LEVEL: Keep this as DEFAULT.
HIVE DATABASE: Specify the HIVE database where the table will be created.
HIVE TABLE: Specify the name of the HIVE table to which the data will be written.
FORMAT: Choose the file format for the HIVE table (e.g., Parquet, ORC, CSV, Json).
SAVE MODE: Choose how to save data in the table, if any (Append, Overwrite, ErrorIfExists, Ignore).
Advanced:
PARTITION BY: (Optional) Specify columns to partition the HIVE table. You can select multiple columns from the “Available” list and move them to the “Selected” list to define the partitioning schema.
NUM BUCKETS: Specify the number of buckets to use when bucketing the HIVE table.
BUCKET BY: (Optional) Specify columns to bucket the HIVE table. You can select multiple columns from the “Available” list and move them to the “Selected” list to define the bucketing scheme.
Examples¶
Save As HIVE Table Node Examples¶
Example of Connection Values¶
General:
HIVE DATABASE: my_hive_db
HIVE TABLE: processed_customer_data
FORMAT: Parquet
SAVE MODE: Overwrite
Advanced:
PARTITION BY: (year,month,country), This would create a partitioned HIVE table where data is organized into directories based on year, month, and country.
NUM BUCKETS: 32
BUCKET BY: customer_id, This would create a bucketed HIVE table where data is divided into 32 buckets based on the customer_id column.