XGBoost Classifier
===========


Input
--------------
It takes in a DataFrame as input and performs XGBoost Classification

Output
--------------
The XGBoost Model generated is passed along to the next nodes. The input DataFrame is also passed along to the next nodes

Type
--------- 

ml-estimator

Class
--------- 

fire.nodes.ml.NodeXGBoostClassifier

Fields
--------- 

.. list-table::
      :widths: 10 5 10
      :header-rows: 1

      * - Name
        - Title
        - Description
      * - featuresCol
        - Features Column
        - Features column of type vectorUDT for model fitting
      * - labelCol
        - Label Column
        - The label column for model fitting
      * - predictionCol
        - Prediction Column
        - The prediction column created during model scoring.
      * - splitRatio
        - Split Ratio
        - Split Ratio
      * - numClass
        - Num Class
        - 
      * - maxDepth
        - Max Depth
        - The Maximum depth of a tree
      * - maxBins
        - Max Bins
        - The maximum number of bins used for discretizing continuous features.Must be >= 2 and >= number of categories in any categorical feature.
      * - maxLeaves
        - Max Leaves
        - 
      * - numRound
        - Num Round
        - 
      * - numWorkers
        - Num Workers
        - 
      * - objective
        - Objective
        - 
      * - eta
        - Eta
        - 
      * - regLambda
        - Reg Lambda
        - 
      * - regAlpha
        - Reg Alpha
        - 
      * - subsample
        - Sub Aample
        - 
      * - sampleType
        - Sample Type
        - 
      * - treeMethod
        - Tree Method
        - 
      * - useExternalMemory
        - Use External Memory
        - 
      * - seed
        - Seed
        - 
      * - baseScore
        - Base Score
        - 
      * - minChildWeight
        - Min Child Weight
        - 
      * - colsampleBylevel
        - Col Sample By Level
        - 
      * - colsampleBytree
        - Col Sample By Tree
        - 
      * - minSplitLoss
        - Min Split Loss
        - 
      * - maxDeltaStep
        - Max Delta Step
        - 
      * - sketchEps
        - Sketch Eps
        - 
      * - scalePosWeight
        - Scale Pos Weight
        - 
      * - growPlicy
        - Grow Policy
        - 
      * - normalizeType
        - Normalize Type
        - 
      * - skipDrop
        - Skip Drop
        - 
      * - rateDrop
        - Rate Drop
        - 


Details
-------
Deatils: https://xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_tutorial.html#xgboost4j-spark-tutorial-version-0-9


XGBoost Classifier Node Details
+++++++++++++++


This node implements the XGBoost algorithm for classification tasks. It can be used for a variety of classification problems, including binary classification (e.g., spam detection) and multi-class classification (e.g., image recognition).


Key Parameters:


Features Column: The name of the column containing the features used for training.

Label Column: The name of the column containing the target variable to be predicted.

Prediction Column: The name of the column where the predicted class probabilities will be stored.

Num Class: The number of classes in the classification problem.

Max Depth: The maximum depth of each tree in the ensemble. Higher values can lead to overfitting.

Max Bins: The maximum number of bins to use for histogram-based approximations.

Max Leaves: The maximum number of leaves per tree.

Num Round: The number of boosting rounds (trees) to build.

Num Workers: The number of threads to use for parallel processing.

Objective: The objective function to optimize. 'multi:softprob' is used for multi-class classification.

Eta: The learning rate, which controls the step size at each boosting round.


Examples
-------
XGBoost Classifier Node Example
+++++++++++++++


Scenario:


Let's assume we have a dataset containing information about customers, including features like age, income, purchase history, etc., and the corresponding target variable being the customer's preferred product category.


Configuration:


1. Features Column: "customer_features"

2. Label Column: "product_category"

3. Prediction Column: "predicted_probabilities"

4. Num Class: 3 (assuming three product categories)

5. Max Depth: 6

6. Num Round: 100

7. Eta: 0.3

8. Objective: "multi:softprob"


Execution:


When this node is executed, the XGBoost algorithm will train a classification model using the specified parameters. The model will then be used to predict the probabilities of each product category for new customer data points. 


Output:


The predicted probabilities for each product category will be stored in the "predicted_probabilities" column of the output dataset. This column will likely be a list or array containing the probabilities for each class.