Feature Selection¶
Compute per-feature importance for classification, regression, or clustering. Supports linear/logistic (|coefficients|), RandomForest/GBT (impurity importances), and KMeans (CH-style score per feature) with optional scaling.
Input¶
Takes a Spark DataFrame as input.
Output¶
Spark DataFrame of feature rankings with columns: feature, model_importance, std_unscaled, ch_feature, scaling_used.
Type¶
transform
Class¶
fire.nodes.fe.NodeFeatureImportance
Fields¶
Name |
Title |
Description |
|---|---|---|
modelType |
Model Type |
Choose task type. Label is required for classification/regression. |
modelName |
Model |
Estimator used to derive importances. For classification use logistic_regression/random_forest/gbt; for regression use linear_regression/random_forest/gbt; for clustering use kmeans. |
label |
Label Column |
Required for classification/regression. |
features |
Feature Columns |
Numeric feature columns used for importance computation. |
nClusters |
Number of Clusters (k) |
Used only when Model Type = clustering (k-means). |
scaling |
Scaling |
Optional scaling applied before modeling. Unscaled stats are always computed from original columns. |
topK |
Top K Rows |
Limit output to top K rows (0 = no limit). |