H2O Auto ML¶

H2O AutoML

Input¶

It takes in a DataFrame as input

Type¶

ml-estimator

Class¶

fire.nodes.h2o.NodeH2OAutoML

Fields¶

Name	Title	Description
isResponseIsCategorical	Is Response Column Categorical	Specify a response column type(numeric or categorical). Separates the Classification and Regression
labelCol	Label Column	Response variable column.
featuresCols	Feature Columns	Features to be used for Modelling
columnsToCategorical	Columns to Categorical	Columns to be Categorical encoded
path	Path	Model Save Path.
seed	Seed	Seed for random number generator; set to a value other than -1 for reproducibility.
balanceClasses	Balance Classes	Balance training data class counts via over/under-sampling (for imbalanced data).
nfolds	N Folds	Number of folds for k-fold cross-validation (defaults to -1 (AUTO), otherwise it must be >=2 or use 0 to disable).
maxModels	Max Models	Maximum number of models to build (optional). Always set this parameter to ensure AutoML reproducibility: all models are then trained until convergence and none is constrained by a time budget.
includeAlgos	Include Algos	A list of algorithms to restrict to during the model-building phase.Default all Algos is included
maxRuntimeSecs	Max Runtime Secs	his argument specifies the maximum time that the AutoML process will run for. If both max_runtime_secs and max_models are specified, then the AutoML run will stop as soon as it hits either of these limits. If neither max_runtime_secs nor max_models are specified, then max_runtime_secs defaults to 3600 seconds (1 hour).
stoppingRounds	Stopping Rounds	Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable).
stoppingMetric	Stopping Metric	Metric to use for early stopping (AUTO: logloss for classification, deviance for regression)
stoppingTolerance	Stopping Tolerance	Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
withContributions	With Contributions	Enables or disables generating a sub-column of detailedPredictionCol containing Shapley values.
advanced	Advanced
convertUnknownCategoricalLevelsToNa	Convert Unknown Categorical Levels to NA	If set to ‘true’, the model converts unknown categorical levels to NA during making predictions.
sortMetric	Sort Metric	Metric to sort the leaderboard by. Defaults to AUTO
maxRuntimeSecsPerModel	Max Runtime in Seconds per Model	Maximum time to spend on each individual model (optional). Note that models constrained by a time budget are not guaranteed reproducible.
predictionCol	Prediction Column	Prediction column name
detailedPredictionCol	Detailed Prediction column	Column containing additional prediction details, its content depends on the model type
withLeafNodeAssignments	With Node Assignments	Enables or disables computation of leaf node assignments.
withStageResults	With Stage Results	Enables or disables computation of stage results.
maxAfterBalanceSize	Max After Balance Size	Maximum relative size of the training data after balancing class counts (defaults to 5.0 and can be less than 1.0). Requires balance_classes.
keepCrossValidationPredictions	Keep Cross Validation Predictions	Whether to keep the predictions of the cross-validation predictions. This needs to be set to TRUE if running the same AutoML object for repeated runs because CV predictions are required to build additional Stacked Ensemble models in AutoML.
keepCrossValidationModels	Keep Cross Validation Models	Whether to keep the cross-validated models. Keeping cross-validation models may consume significantly more memory in the H2O cluster.
keepCrossValidationFoldAssignment	Keep Cross Validation Fold Assignment	Whether to keep cross-validation assignments.
distribution	Distribution	Distribution function used by algorithms that support it; other algorithms use their defaults.
tweediePower	Tweedie Power	Tweedie power for Tweedie regression, must be between 1 and 2.
quantileAlpha	Quantile Alpha	Desired quantile for Quantile regression, must be between 0 and 1.
huberAlpha	Huber Alpha	Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
exploitationRatio	Exploitation Ratio	The budget ratio (between 0 and 1) dedicated to the exploitation (vs exploration) phase.
foldCol	Fold Column	Column with cross-validation fold index assignment per observation.
weightCol	Weight Column	Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0.

Details¶

H2O AutoML(for Regression and Classification) The H2O AutoML interface is designed to have as few parameters as possible so that all the user needs to do is point to their dataset, identify the response column and optionally specify a time constraint or limit on the number of total models trained.

More details are available at : http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html#automl-automatic-machine-learning