Feature Selection

Compute per-feature importance for classification, regression, or clustering. Supports linear/logistic (|coefficients|), RandomForest/GBT (impurity importances), and KMeans (CH-style score per feature) with optional scaling.

Input

Takes a Spark DataFrame as input.

Output

Spark DataFrame of feature rankings with columns: feature, model_importance, std_unscaled, ch_feature, scaling_used.

Type

transform

Class

fire.nodes.fe.NodeFeatureImportance

Fields

Name

Title

Description

modelType

Model Type

Choose task type. Label is required for classification/regression.

modelName

Model

Estimator used to derive importances. For classification use logistic_regression/random_forest/gbt; for regression use linear_regression/random_forest/gbt; for clustering use kmeans.

label

Label Column

Required for classification/regression.

features

Feature Columns

Numeric feature columns used for importance computation.

nClusters

Number of Clusters (k)

Used only when Model Type = clustering (k-means).

scaling

Scaling

Optional scaling applied before modeling. Unscaled stats are always computed from original columns.

topK

Top K Rows

Limit output to top K rows (0 = no limit).