XGBoost Classifier¶
Input¶
It takes in a DataFrame as input and performs XGBoost Classification
Output¶
The XGBoost Model generated is passed along to the next nodes. The input DataFrame is also passed along to the next nodes
Type¶
ml-estimator
Class¶
fire.nodes.ml.NodeXGBoostClassifier
Fields¶
Name |
Title |
Description |
|---|---|---|
featuresCol |
Features Column |
Features column of type vectorUDT for model fitting |
labelCol |
Label Column |
The label column for model fitting |
predictionCol |
Prediction Column |
The prediction column created during model scoring. |
splitRatio |
Split Ratio |
Split Ratio |
numClass |
Num Class |
|
maxDepth |
Max Depth |
The Maximum depth of a tree |
maxBins |
Max Bins |
The maximum number of bins used for discretizing continuous features.Must be >= 2 and >= number of categories in any categorical feature. |
maxLeaves |
Max Leaves |
|
numRound |
Num Round |
|
numWorkers |
Num Workers |
|
objective |
Objective |
|
eta |
Eta |
|
regLambda |
Reg Lambda |
|
regAlpha |
Reg Alpha |
|
subsample |
Sub Aample |
|
sampleType |
Sample Type |
|
treeMethod |
Tree Method |
|
useExternalMemory |
Use External Memory |
|
seed |
Seed |
|
baseScore |
Base Score |
|
minChildWeight |
Min Child Weight |
|
colsampleBylevel |
Col Sample By Level |
|
colsampleBytree |
Col Sample By Tree |
|
minSplitLoss |
Min Split Loss |
|
maxDeltaStep |
Max Delta Step |
|
sketchEps |
Sketch Eps |
|
scalePosWeight |
Scale Pos Weight |
|
growPlicy |
Grow Policy |
|
normalizeType |
Normalize Type |
|
skipDrop |
Skip Drop |
|
rateDrop |
Rate Drop |
Details¶
XGBoost Classifier Node Details¶
This node implements the XGBoost algorithm for classification tasks. It can be used for a variety of classification problems, including binary classification (e.g., spam detection) and multi-class classification (e.g., image recognition).
Key Parameters:
Features Column: The name of the column containing the features used for training.
Label Column: The name of the column containing the target variable to be predicted.
Prediction Column: The name of the column where the predicted class probabilities will be stored.
Num Class: The number of classes in the classification problem.
Max Depth: The maximum depth of each tree in the ensemble. Higher values can lead to overfitting.
Max Bins: The maximum number of bins to use for histogram-based approximations.
Max Leaves: The maximum number of leaves per tree.
Num Round: The number of boosting rounds (trees) to build.
Num Workers: The number of threads to use for parallel processing.
Objective: The objective function to optimize. ‘multi:softprob’ is used for multi-class classification.
Eta: The learning rate, which controls the step size at each boosting round.
Examples¶
XGBoost Classifier Node Example¶
Scenario:
Let’s assume we have a dataset containing information about customers, including features like age, income, purchase history, etc., and the corresponding target variable being the customer’s preferred product category.
Configuration:
Features Column: “customer_features”
Label Column: “product_category”
Prediction Column: “predicted_probabilities”
Num Class: 3 (assuming three product categories)
Max Depth: 6
Num Round: 100
Eta: 0.3
Objective: “multi:softprob”
Execution:
When this node is executed, the XGBoost algorithm will train a classification model using the specified parameters. The model will then be used to predict the probabilities of each product category for new customer data points.
Output:
The predicted probabilities for each product category will be stored in the “predicted_probabilities” column of the output dataset. This column will likely be a list or array containing the probabilities for each class.