H2O XGBoost

H2O XGBoost

Input

It takes in a two DataFrame as input, one is for train and other one is for validation.

Type

ml-estimator

Class

fire.nodes.h2o.NodeH2OXGBoost

Fields

Name

Title

Description

isResponseIsCategorical

Is Response Column Categorical

Specify a response column type(numeric or categorical). Separates the Classification and Regression

labelCol

Label Column

Response variable column.

featuresCols

Feature Columns

Features to be used for Modelling

path

Path

Save Confusion Matrix to Path

columnsToCategorical

Columns to Categorical

Columns to be Categorical encoded

seed

Seed

Seed for pseudo random number generator (if applicable).

splitRatio

Split Ratio

Split Ratio

nfolds

Number of Folds

Number of folds for K-fold cross-validation (0 to disable or >= 2).

ntrees

Number of Trees

Number of trees.

maxDepth

Max Depth

Maximum tree depth (0 for unlimited).

minRows

Min Rows

Fewest allowed (weighted) observations in a leaf.

maxBins

Max Bins

For tree_method=hist only: maximum number of bins.

maxLeaves

Max Leaves

For tree_method=hist only: maximum number of leaves.

treeMethod

Tree Method

Tree method.

growPolicy

Grow Policy

Grow policy.

booster

Booster

Booster

eta

Eta

(same as learn_rate) Learning rate (from 0.0 to 1.0).

sampleRate

Sample Rate

(same as subsample) Row sample rate per tree (from 0.0 to 1.0)..

categoricalEncoding

Categorical Encoding

Specify one of the various encoding schemes for handling categorical features

ignoreConstCols

Ignore Const Columns

Ignore constant columns.

scoreEachIteration

Score Each Iteration

Whether to score during each iteration of model training.

stoppingRounds

Stopping Rounds

Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable).

maxRuntimeSecs

Max Runtime Secs

his argument specifies the maximum time that the AutoML process will run for. If both max_runtime_secs and max_models are specified, then the AutoML run will stop as soon as it hits either of these limits. If neither max_runtime_secs nor max_models are specified, then max_runtime_secs defaults to 3600 seconds (1 hour).

stoppingMetric

Stopping Metric

Metric to use for early stopping (AUTO: logloss for classification, deviance for regression)

stoppingTolerance

Stopping Tolerance

Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)

gainsliftBins

Gains Lift Bins

Gains/Lift table number of bins. 0 means disabled.. Default value -1 means automatic binning.

withContributions

With Contributions

Enables or disables generating a sub-column of detailedPredictionCol containing Shapley values.

advanced

Advanced

convertUnknownCategoricalLevelsToNa

Convert Unknown Categorical Levels to NA

If set to ‘true’, the model converts unknown categorical levels to NA during making predictions.

withLeafNodeAssignments

With Node Assignments

Enables or disables computation of leaf node assignments.

withStageResults

With Stage Results

Enables or disables computation of stage results.

minChildWeight

Min Child Weight

(same as min_rows) Fewest allowed (weighted) observations in a leaf.

learnRate

Learn Rate

(Same as eta) Learning rate (from 0.0 to 1.0).

subsample

Sample Rate

(same as sample_rate) Row sample rate per tree (from 0.0 to 1.0).

colSampleRate

Column Sample Rate

Column sample rate(from 0.0 to 1.0).

colSampleByLevel

Column Sample By Level

(same as col_sample_rate) Column sample rate (from 0.0 to 1.0).

colSampleRatePerTree

Column Sample Rate Per Tree

(same as colsample_bytree) Column sample rate per tree (from 0.0 to 1.0).

colSampleByTree

Column Sample By Tree

(same as col_sample_rate_per_tree) Column sample rate per tree (from 0.0 to 1.0).

colSampleByNode

Column Sample By Node

Column sample rate per tree node (from 0.0 to 1.0).

maxAbsLeafnodePred

Max Absolute Leaf Node Prediction

(same as max_delta_step) Maximum absolute value of a leaf node prediction.

maxDeltaStep

Max Delta Step

(same as max_abs_leafnode_pred) Maximum absolute value of a leaf node prediction.

scoreTreeInterval

Score Tree Interval

Score the model after every so many trees. Disabled if set to 0.

minSplitImprovement

Minimum Split Improvement

gamma

Gamma

nthreads

Number of Trees

Number of parallel threads that can be used to run XGBoost. Cannot exceed H2O cluster limits (-nthreads parameter). Defaults to maximum available.

buildTreeOneNode

Build tree one node

Enables to run on a single node

calibrateModel

Calibrate Model

Use Platt Scaling to calculate calibrated class probabilities. Calibration can provide more accurate estimates of class probabilities.

regLambda

Reg Lambda

L2 regularization.

regAlpha

Reg Alpha

L1 regularization.

quietMode

Quiet mode

Enable quiet mode for less output to standard output.

sampleType

Sample Type

For booster=dart only: sample_type

normalizeType

Normalize Type

For booster=dart only: normalize_type

rateDrop

Rate Drop

For booster=dart only: rate_drop (0..1).

oneDrop

One Drop

For booster=dart only: one_drop.

skipDrop

Skip Drop

For booster=dart only: skip_drop (0..1).

dmatrixType

Dmatrix Type

Type of DMatrix. For sparse, NAs and 0 are treated equally.

scalePosWeight

Scaled Pos Weight

Controls the effect of observations with positive labels in relation to the observations with negative labels on gradient calculation. Useful for imbalanced problems.

keepCrossValidationModels

Keep Cross Validation Models

Whether to keep the cross-validated models. Keeping cross-validation models may consume significantly more memory in the H2O cluster.

keepCrossValidationPredictions

Keep Cross Validation Predictions

Whether to keep the predictions of the cross-validation predictions. This needs to be set to TRUE if running the same AutoML object for repeated runs because CV predictions are required to build additional Stacked Ensemble models in AutoML.

keepCrossValidationFoldAssignment

Keep Cross Validation Fold Assignment

Whether to keep cross-validation assignments.

distribution

Distribution

Distribution function.)

tweediePower

Tweedie Power

Tweedie power for Tweedie regression, must be between 1 and 2.

quantileAlpha

Quantile Alhpa

Desired quantile for Quantile regression, must be between 0 and 1.

huberAlpha

Huber Alpha

Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).

weightCol

Weight Column

Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0.

offsetCol

Offset Column

Offset column. This will be added to the combination of columns before applying the link function.

foldCol

Fold Column

Column with cross-validation fold index assignment per observation.

foldAssignment

Fold Assignment

Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems.

aucType

AUC Type

Set default multinomial AUC type.

confusionMatrix

Confusion Matrix

output_confusion_matrix_chart

Output Confusion Matrix Chart

whether to display confusion matrix chart.

cm_chart_title

Confusion Matrix Chart Title

Title name to display in Confusion Matrix Chart

cm_chart_description

Confusion Matrix Chart Description

Description to display in Confusion Matrix CHart

confusionMatrixTargetLegend

Confusion Matrix Target Legend

Legend name to display for Target in Confusion Matrix

confusionMatrixPredictedLabelLegend

Confusion Matrix PredictedLabel Legend

Legend name to display for Predicted Label in Confusion Matrix

confusionMatrixCountLegend

Confusion Matrix Count Legend

Legend name to display for Count in Confusion Matrix

Description

Confusion Matrix Description

confusionMatrixRowDescription

Confusion Matrix Outcome description

One can provide the business details of the outcome of the confusion matrix rows

ROC Curve

ROC Curve

output_roc_curve

Output ROC Curve

whether to display confusion matrix chart.

roc_title

ROC Curve Chart Title

Title name to display in ROC Curve Chart

roc_description

ROC Curve Chart Description

Add Description for ROC Curve Chart

xlabel

X Label

X label

ylabel

Y Label

Y Label

Grid Search

Grid Search

paramKeys

Param Name

Param Names. eg: maxDepth ,learnRate, nTrees

paramValues

Param Value

Param Values. eg: 4,5,6

gridStrategy

Grid Search Strategy

Strategy to use for model hyperparameter search. Cartesian does exhaustive search; RandomDiscrete searches randomly within given time or model limits.

gridMaxModels

Grid Max Models

Maximum number of models to build in the grid search (0 for unlimited).

gridMaxRuntimeSecs

Grid Max Runtime Seconds

Maximum runtime in seconds for the grid search (0 for unlimited).

gridStoppingRounds

Grid Stopping Rounds

Early stopping based on convergence of the metric during grid search (0 to disable).

gridStoppingTolerance

Grid Stopping Tolerance

Tolerance for metric-based stopping criterion during grid search.

gridStoppingMetric

Grid Stopping Metric

Metric to use for early stopping during grid search (AUTO: logloss for classification, deviance for regression).

gridParallelism

Grid Parallelism

Level of parallelism to use when building models in the grid.

gridSelectBestModelBy

Grid Select Best Model By

Metric used to select the best model from the grid.