Data Drift =========== This node calculates the Population Stability Index (PSI) for a set of features by comparing a reference dataset to a test dataset. It is designed to identify potential data drift in both continuous and categorical features. Input -------------- A reference dataset and a test dataset provided as DataFrames. The reference dataset serves as the baseline distribution, while the test dataset is used to detect any drift. Output -------------- A Spark DataFrame with two columns: 'feature_name' and 'psi_value'. Each row represents a feature and its corresponding PSI value, indicating the level of drift. Type --------- ml-estimator Class --------- fire.nodes.ml.NodeDataDrift Fields --------- .. list-table:: :widths: 10 5 10 :header-rows: 1 * - Name - Title - Description * - inputCols - Input Columns - A list of feature names on which the PSI will be calculated. * - categoricalCols - Categorical Columns - A list of features that are categorical. All other features are treated as continuous. * - numBins - Num Bins - The number of bins to use when binning continuous features for PSI calculation.