XGBoost Regressor
===========


Input
--------------
It takes in a DataFrame as input and performs XGBoost Regression

Output
--------------
The XGBoost Model generated is passed along to the next nodes. The input DataFrame is also passed along to the next nodes

Type
--------- 

ml-estimator

Class
--------- 

fire.nodes.ml.NodeXGBoostRegressor

Fields
--------- 

.. list-table::
      :widths: 10 5 10
      :header-rows: 1

      * - Name
        - Title
        - Description
      * - featuresCol
        - Features Column
        - Features column of type vectorUDT for model fitting
      * - labelCol
        - Label Column
        - The label column for model fitting
      * - predictionCol
        - Prediction Column
        - The prediction column created during model scoring.
      * - splitRatio
        - Split Ratio
        - Split Ratio
      * - maxDepth
        - Max Depth
        - The Maximum depth of a tree
      * - maxBins
        - Max Bins
        - The maximum number of bins used for discretizing continuous features.Must be >= 2 and >= number of categories in any categorical feature.
      * - maxLeaves
        - Max Leaves
        - 
      * - numRound
        - Num Round
        - 
      * - numWorkers
        - Num Workers
        - 
      * - objective
        - Objective
        - 
      * - eta
        - Eta
        - 
      * - regLambda
        - Reg Lambda
        - 
      * - regAlpha
        - Reg Alpha
        - 
      * - subsample
        - Sub Aample
        - 
      * - sampleType
        - Sample Type
        - 
      * - treeMethod
        - Tree Method
        - 
      * - useExternalMemory
        - Use External Memory
        - 
      * - seed
        - Seed
        - 
      * - baseScore
        - Base Score
        - 
      * - minChildWeight
        - Min Child Weight
        - 
      * - colsampleBylevel
        - Col Sample By Level
        - 
      * - colsampleBytree
        - Col Sample By Tree
        - 
      * - minSplitLoss
        - Min Split Loss
        - 
      * - maxDeltaStep
        - Max Delta Step
        - 
      * - sketchEps
        - Sketch Eps
        - 
      * - scalePosWeight
        - Scale Pos Weight
        - 
      * - growPlicy
        - Grow Policy
        - 
      * - normalizeType
        - Normalize Type
        - 
      * - skipDrop
        - Skip Drop
        - 
      * - rateDrop
        - Rate Drop
        - 


Details
-------
Deatils: https://xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_tutorial.html#xgboost4j-spark-tutorial-version-0-9


XGBoost Regressor Node Details
+++++++++++++++


This node implements the XGBoost algorithm for regression tasks. It can be used for a variety of regression problems, including predicting continuous values, such as stock prices, house prices, or weather patterns.


Key Parameters:


Features Column: The name of the column containing the features used for training.

Label Column: The name of the column containing the target variable to be predicted.

Prediction Column: The name of the column where the predicted values will be stored.

Max Depth: The maximum depth of each tree in the ensemble. Higher values can lead to overfitting.

Max Bins: The maximum number of bins to use for histogram-based approximations.

Max Leaves: The maximum number of leaves per tree.

Num Round: The number of boosting rounds (trees) to build.

Num Workers: The number of threads to use for parallel processing.

Objective: The objective function to optimize. 'reg:linear' is used for linear regression.

Eta: The learning rate, which controls the step size at each boosting round.

se.


Examples
-------
XGBoost Regressor Node Example
+++++++++++++++


Scenario:


Let's assume we have a dataset containing information about houses, including features like size, number of bedrooms, location, etc., and the corresponding target variable being the house price. 


Configuration:


1. Features Column: "features"

2. Label Column: "price"

3. Prediction Column: "predicted_price"

4. Max Depth: 5

5. Num Round: 100

6. Eta: 0.1

7. Objective: "reg:linear"


Execution:


When this node is executed, the XGBoost algorithm will train a regression model using the specified parameters. The model will then be used to predict the house prices for new data points. 


Output:


The predicted house prices will be stored in the "predicted_price" column of the output dataset.