Feature Selection¶

Compute per-feature importance for classification, regression, or clustering. Supports linear/logistic (|coefficients|), RandomForest/GBT (impurity importances), and KMeans (CH-style score per feature) with optional scaling.

Input¶

Takes a Spark DataFrame as input.

Output¶

Spark DataFrame of feature rankings with columns: feature, model_importance, std_unscaled, ch_feature, scaling_used.

Type¶

transform

Class¶

fire.nodes.fe.NodeFeatureImportance

Fields¶

Name	Title	Description
modelType	Model Type	Choose task type. Label is required for classification/regression.
modelName	Model	Estimator used to derive importances. For classification use logistic_regression/random_forest/gbt; for regression use linear_regression/random_forest/gbt; for clustering use kmeans.
label	Label Column	Required for classification/regression.
features	Feature Columns	Numeric feature columns used for importance computation.
nClusters	Number of Clusters (k)	Used only when Model Type = clustering (k-means).
scaling	Scaling	Optional scaling applied before modeling. Unscaled stats are always computed from original columns.
topK	Top K Rows	Limit output to top K rows (0 = no limit).