Moving Average Features

This node computes various global moving average features from a DataFrame containing transactional data.

Input

It takes a DataFrame with date, and optionally user, amount, quantity columns.

Output

Returns the original DataFrame with new global moving average feature columns appended.

Type

pyspark

Class

fire.nodes.fe.NodeMovingAverageFeatures

Fields

Name

Title

Description

dateCol

Date/Timestamp Column

Column representing the transaction date or timestamp.

userCol

User ID Column

Column representing the user or entity ID.

amountCol

Amount Column

Column representing the transaction amount.

quantityCol

Quantity Column

Column representing the quantity.

enable_global_moving_avg_txn_count_per_day

Enable Global Moving Avg Txn Count Per Day

Global average transaction count per day over the window.

global_moving_avg_txn_count_per_day_window

Window Size (days)

enable_global_moving_avg_gap_days

Enable Global Moving Avg Gap Days

Global average gap days between transactions over the window.

global_moving_avg_gap_days_window

Window Size (days)

enable_global_hourly_avg_txn_count

Enable Global Hourly Avg Txn Count

Global average transaction count per hour over the window.

global_hourly_avg_txn_count_window_hours

Window Size (hours)

enable_global_daily_avg_amount

Enable Global Daily Avg Amount (Required: Amount)

Global average amount per day over the window.

global_daily_avg_amount_window

Window Size (days)

enable_global_moving_avg_amount

Enable Global Moving Avg Amount (Required: Amount)

Global average transaction amount over the window.

global_moving_avg_amount_window

Window Size (days)

enable_global_moving_avg_sales_per_day

Enable Global Moving Avg Sales Per Day (Required: Quantity)

Global average sales (quantity) per day over the window.

global_moving_avg_sales_per_day_window

Window Size (days)

enable_global_moving_avg_unique_users_per_day

Enable Global Moving Avg Unique Users Per Day (Required: User)

Global average unique users per day over the window.

global_moving_avg_unique_users_per_day_window

Window Size (days)

Details

Moving Average Features Node Details

The Moving Average Features node is designed to compute global moving average features from transactional data stored in a DataFrame. It calculates metrics such as average transaction counts, gap days, amounts, sales quantities, and unique users over specified time windows. These features are appended as new columns to the input DataFrame, providing insights into global trends across all transactions.

General:

Date/Timestamp Column:

Specifies the column containing the date or timestamp of the transactions. This is a required field used for all time-based feature computations.

User ID Column:

Specifies an optional column containing the user or entity identifier (e.g., user_id, customer_id). Required for computing global moving average unique users per day.

Amount Column:

Specifies an optional column containing the transaction amount (e.g., purchase value). Required for features like global moving average amount and global daily average amount.

Quantity Column:

Specifies an optional column containing the quantity of items in the transaction. Required for computing global moving average sales per day.

Enable Global Moving Avg Txn Count Per Day:

When enabled, calculates the global average transaction count per day over a specified window (in days).

Global Moving Avg Txn Count Per Day Window:

Specifies the window size (in days) for the global moving average transaction count per day.

Enable Global Moving Avg Gap Days:

When enabled, calculates the global average gap days between transactions over a specified window.

Global Moving Avg Gap Days Window:

Specifies the window size (in days) for the global moving average gap days calculation.

Enable Global Hourly Avg Txn Count:

When enabled, calculates the global average transaction count per hour over a specified window (in hours).

Global Hourly Avg Txn Count Window:

Specifies the window size (in hours) for the global hourly average transaction count.

Enable Global Daily Avg Amount:

When enabled, calculates the global average transaction amount per day over a specified window. Requires the Amount Column.

Global Daily Avg Amount Window:

Specifies the window size (in days) for the global daily average amount calculation.

Enable Global Moving Avg Amount:

When enabled, calculates the global average transaction amount over a specified window. Requires the Amount Column.

Global Moving Avg Amount Window:

Specifies the window size (in days) for the global moving average amount calculation.

Enable Global Moving Avg Sales Per Day:

When enabled, calculates the global average sales (quantity) per day over a specified window. Requires the Quantity Column.

Global Moving Avg Sales Per Day Window:

Specifies the window size (in days) for the global moving average sales per day calculation.

Enable Global Moving Avg Unique Users Per Day:

When enabled, calculates the global average number of unique users per day over a specified window. Requires the User ID Column.

Global Moving Avg Unique Users Per Day Window:

Specifies the window size (in days) for the global moving average unique users per day calculation.

Output:

The node outputs the original DataFrame with additional columns based on the enabled features:

  • moving_avg_amount_<window>d

  • moving_avg_txn_count_per_day_<window>d

  • moving_avg_gap_days_<window>d

  • moving_avg_sales_per_day_<window>d

  • moving_avg_unique_users_per_day_<window>d

  • daily_avg_amount_<window>d

  • hourly_avg_txn_count_per_hour_<window>h

Examples

Moving Average Features Node Examples

Input:

A DataFrame contains the following data:

  • eventDate: [“2023-01-01 08:00:00”, “2023-01-01 12:00:00”, “2023-01-02 10:00:00”, “2023-01-03 14:00:00”, “2023-01-04 18:00:00”]

  • userId: [“U1”, “U2”, “U1”, “U2”, “U3”]

  • amount: [100.0, 150.0, 200.0, 300.0, 500.0]

  • quantity: [2, 3, 1, 4, 5]

The Moving Average Features node is configured as follows:

  • Date/Timestamp Column: eventDate

  • User ID Column: userId

  • Amount Column: amount

  • Quantity Column: quantity

  • Enable Global Moving Avg Txn Count Per Day: true

  • Global Moving Avg Txn Count Per Day Window: 7

  • Enable Global Moving Avg Gap Days: true

  • Global Moving Avg Gap Days Window: 7

  • Enable Global Hourly Avg Txn Count: true

  • Global Hourly Avg Txn Count Window: 24

  • Enable Global Daily Avg Amount: true

  • Global Daily Avg Amount Window: 7

  • Enable Global Moving Avg Amount: true

  • Global Moving Avg Amount Window: 7

  • Enable Global Moving Avg Sales Per Day: true

  • Global Moving Avg Sales Per Day Window: 7

  • Enable Global Moving Avg Unique Users Per Day: true

  • Global Moving Avg Unique Users Per Day Window: 7

Output:

The node processes the DataFrame and produces the following result (values are illustrative, assuming calculations are based on the input data):

  • eventDate: “2023-01-01 08:00:00”, userId: “U1”, amount: 100.0, quantity: 2

moving_avg_txn_count_per_day_7d: 0.2857 (2 transactions over 7 days)

moving_avg_gap_days_7d: null (insufficient data for gap calculation)

hourly_avg_txn_count_per_hour_24h: 0.0833 (2 transactions over 24 hours)

daily_avg_amount_7d: 125.0 (250.0 total amount over 2 days)

moving_avg_amount_7d: 35.7143 (250.0 total amount over 7 days)

moving_avg_sales_per_day_7d: 0.7143 (5 total quantity over 7 days)

moving_avg_unique_users_per_day_7d: 1.0 (2 unique users over 2 days)

  • eventDate: “2023-01-01 12:00:00”, userId: “U2”, amount: 150.0, quantity: 3

moving_avg_txn_count_per_day_7d: 0.2857

moving_avg_gap_days_7d: 0.1667 (4 hours gap converted to days)

hourly_avg_txn_count_per_hour_24h: 0.0833

daily_avg_amount_7d: 125.0

moving_avg_amount_7d: 35.7143

moving_avg_sales_per_day_7d: 0.7143

moving_avg_unique_users_per_day_7d: 1.0

  • eventDate: “2023-01-02 10:00:00”, userId: “U1”, amount: 200.0, quantity: 1

moving_avg_txn_count_per_day_7d: 0.4286 (3 transactions over 7 days)

moving_avg_gap_days_7d: 0.5 (1 day gap)

hourly_avg_txn_count_per_hour_24h: 0.0417 (1 transaction over 24 hours)

daily_avg_amount_7d: 175.0 (350.0 total amount over 2 days)

moving_avg_amount_7d: 50.0 (350.0 total amount over 7 days)

moving_avg_sales_per_day_7d: 0.8571 (6 total quantity over 7 days)

moving_avg_unique_users_per_day_7d: 1.0

  • eventDate: “2023-01-03 14:00:00”, userId: “U2”, amount: 300.0, quantity: 4

moving_avg_txn_count_per_day_7d: 0.5714 (4 transactions over 7 days)

moving_avg_gap_days_7d: 0.6667 (average of 1 day and 1 day gaps)

hourly_avg_txn_count_per_hour_24h: 0.0417

daily_avg_amount_7d: 216.6667 (650.0 total amount over 3 days)

moving_avg_amount_7d: 92.8571 (650.0 total amount over 7 days)

moving_avg_sales_per_day_7d: 1.4286 (10 total quantity over 7 days)

moving_avg_unique_users_per_day_7d: 1.0

  • eventDate: “2023-01-04 18:00:00”, userId: “U3”, amount: 500.0, quantity: 5

moving_avg_txn_count_per_day_7d: 0.7143 (5 transactions over 7 days)

moving_avg_gap_days_7d: 0.75 (average of 1, 1, and 1 day gaps)

hourly_avg_txn_count_per_hour_24h: 0.0417

daily_avg_amount_7d: 287.5 (1150.0 total amount over 4 days)

moving_avg_amount_7d: 164.2857 (1150.0 total amount over 7 days)

moving_avg_sales_per_day_7d: 2.1429 (15 total quantity over 7 days)

moving_avg_unique_users_per_day_7d: 1.0