H2O Neural Network
===========

H2O Deep Learning is based on a multi-layer feedforward artificial neural network that is trained with stochastic gradient descent using back-propagation.

Input
--------------
It takes in a DataFrame as input

Type
--------- 

ml-estimator

Class
--------- 

fire.nodes.h2o.NodeH2ONeuralNetwork

Fields
--------- 

.. list-table::
      :widths: 10 5 10
      :header-rows: 1

      * - Name
        - Title
        - Description
      * - isResponseIsCategorical
        - isResponseColIsCategorical
        - Specify a response column type(numeric or categorical). Separates the Classification and Regression
      * - labelCol
        - Label Column
        - Response variable column.
      * - ignoredCols
        - Ignored Columns
        - Features to be ignored for Modelling
      * - splitRatio
        - Split Ratio
        - Split Ratio
      * - columnsToCategorical
        - Columns to Categorical
        - Columns to be Categorical encoded
      * - seed
        - Seed
        - Seed for pseudo random number generator (if applicable).
      * - balanceClasses
        - Balance Classes
        - Balance training data class counts via over/under-sampling (for imbalanced data).
      * - maxAfterBalanceSize
        - Max After Balance Size
        - Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes.
      * - activation
        - Activation
        - Activation function.
      * - hidden
        - Hidden
        - Specify the hidden layer sizes (value must be positive)
      * - epochs
        - Epochs
        - SHow many times the dataset should be iterated (streamed), can be fractional.
      * - nfolds
        - Number of Folds
        - Number of folds for K-fold cross-validation (0 to disable or >= 2).
      * - trainSamplesPerIteration
        - Train Samples Per Iteration
        - Number of training samples (globally) per MapReduce iteration. Special values are 0: one epoch, -1: all available data (e.g., replicated training data), -2: automatic.
      * - targetRatioCommToComp
        - Target ratio comm to comp
        - Target ratio of communication overhead to computation. Only for multi-node operation and train_samples_per_iteration = -2 (auto-tuning).
      * - categoricalEncoding
        - Categorical Encoding
        - Specify one of the various encoding schemes for handling categorical features
      * - ignoreConstCols
        - Ignore Const Columns
        - Ignore constant columns.
      * - scoreEachIteration
        - Score Each Iteration
        - Whether to score during each iteration of model training.
      * - stoppingRounds
        - Stopping Rounds
        - Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable).
      * - maxRuntimeSecs
        - Max Runtime Secs
        - his argument specifies the maximum time that the AutoML process will run for. If both max_runtime_secs and max_models are specified, then the AutoML run will stop as soon as it hits either of these limits. If neither max_runtime_secs nor max_models are specified, then max_runtime_secs defaults to 3600 seconds (1 hour).
      * - stoppingMetric
        - StoppingMetric
        - Metric to use for early stopping (AUTO: logloss for classification, deviance for regression)
      * - stoppingTolerance
        - StoppingTolerance
        - Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
      * - standardize
        - Standardize
        - If enabled, automatically standardize the data. If disabled, the user must provide properly scaled input data.
      * - loss
        - Loss
        - SLoss function.
      * - adaptiveRate
        - Adaptive Rate
        - Adaptive learning rate.
      * - rho
        - Rho
        - Adaptive learning rate time decay factor (similarity to prior updates).
      * - advanced
        - Advanced
        - 
      * - epsilon
        - Epsilon
        - Adaptive learning rate smoothing factor (to avoid divisions by zero and allow progress).
      * - rate
        - Rate
        - Learning rate (higher => less stable, lower => slower convergence).
      * - rateAnnealing
        - Rate Annealing
        - Learning rate annealing: rate / (1 + rate_annealing * samples).
      * - rateDecay
        - Rate Decay
        - Learning rate decay factor between layers (N-th layer: rate * rate_decay ^ (n - 1)
      * - momentumStart
        - Momentum Start
        - Specify the initial momentum at the beginning of training; we suggest 0.5 (Applicable only if adaptive_rate is disabled)
      * - momentumRamp
        - Momentum Ramp
        - Number of training samples for which momentum increases.
      * - momentumStable
        - Momentum Stable
        - Final momentum after the ramp is over (try 0.99).
      * - nesterovAcceleratedGradient
        - Nesterov Accelerated Gradient
        - Use Nesterov accelerated gradient (recommended).
      * - inputDropoutRatio
        - Input Dropout Ratio
        - Input layer dropout ratio (can improve generalization, try 0.1 or 0.2).
      * - inputDropoutRatio
        - Hidden Dropout Ratios
        - Hidden layer dropout ratios (can improve generalization), specify one value per hidden layer, defaults to 0.5.
      * - l1
        - L1
        - L1 regularization (can add stability and improve generalization, causes many weights to become 0).
      * - l2
        - L2
        - L2 regularization (can add stability and improve generalization, causes many weights to be small.
      * - maxW2
        - Max W2
        - Constraint for squared sum of incoming weights per unit (e.g. for Rectifier).
      * - initialWeightDistribution
        - Initial Weight Distribution
        - Initial weight distribution.
      * - initialWeightScale
        - Initial Weight Scale
        - Uniform: -value...value, Normal: stddev.
      * - scoreInterval
        - Score interval
        - Shortest time interval (in seconds) between model scoring.
      * - scoreTrainingSamples
        - Score Training Samples
        - Number of training set samples for scoring (0 for all).
      * - scoreValidationSamples
        - Score Validation Samples
        - Number of validation set samples for scoring (0 for all).
      * - scoreDutyCycle
        - Score Duty Cycle
        - Maximum duty cycle fraction for scoring (lower: more training, higher: more scoring).
      * - classificationStop
        - Classification Stop
        - Stopping criterion for classification error fraction on training data (-1 to disable).
      * - regressionStop
        - Regression Stop
        - Stopping criterion for regression error (MSE) on training data (-1 to disable).
      * - quietMode
        - Quiet mode
        - Enable quiet mode for less output to standard output.
      * - scoreValidationSampling
        - Score Validation Sampling
        - Method used to sample validation dataset for scoring.
      * - overwriteWithBestModel
        - Overwrite With Best Model
        - If enabled, override the final model with the best model found during training.
      * - useAllFactorLevels
        - Use All Factor Levels
        - Use all factor levels of categorical variables. Otherwise, the first factor level is omitted (without loss of accuracy). Useful for variable importances and auto-enabled for autoencoder.
      * - diagnostics
        - Diagnostics
        - Enable diagnostics for hidden layers.
      * - calculateFeatureImportances
        - Calculate Feature Importances
        - Compute variable importances for input features (Gedeon method) - can be slow for large networks.
      * - fastMode
        - Fast Mode
        - SEnable fast mode (minor approximation in back-propagation).
      * - forceLoadBalance
        - Force Load Balance
        - Force extra load balancing to increase training speed for small datasets (to keep all cores busy).
      * - replicateTrainingData
        - Replicate Training Data
        - Replicate the entire training dataset onto every node for faster training on small datasets.
      * - singleNodeMode
        - Single Node Mode
        - Run on a single node for fine-tuning of model parameters.
      * - shuffleTrainingData
        - Shuffle Training Data
        - Enable shuffling of training data (recommended if training data is replicated and train_samples_per_iteration is close to #nodes x #rows, of if using balance_classes).
      * - missingValuesHandling
        - Missing Values Handling
        - Handling of missing values. Either MeanImputation or Skip.
      * - sparse
        - Sparse
        - Sparse data handling (more efficient for data with lots of 0 values).
      * - averageActivation
        - Average Activation
        - Average activation for sparse auto-encoder. #Experimental.
      * - sparsityBeta
        - Sparsity Beta
        - Sparsity regularization. #Experimental.
      * - maxCategoricalFeatures
        - Max Categorical Features
        - Max. number of categorical features, enforced via hashing. #Experimental.
      * - reproducible
        - Reproducible
        - Force reproducibility on small data (will be slow - only uses 1 thread).
      * - exportWeightsAndBiases
        - Export Weights And Biases
        - Whether to export Neural Network weights and biases to H2O Frames.
      * - miniBatchSize
        - Mini Batch Size
        - SMini-batch size (smaller leads to better fit, larger can speed up and generalize better).
      * - elasticAveraging
        - Elastic Averaging
        - Elastic averaging between compute nodes can improve distributed model convergence. #Experimental.
      * - elasticAveragingMovingRate
        - Elastic Averaging Moving Rate
        - Elastic averaging moving rate (only if elastic averaging is enabled).
      * - elasticAveragingRegularization
        - Elastic Averaging Regularization
        - Elastic averaging regularization strength (only if elastic averaging is enabled).
      * - keepCrossValidationModels
        - Keep Cross Validation Models
        - Whether to keep the cross-validated models. Keeping cross-validation models may consume significantly more memory in the H2O cluster.
      * - keepCrossValidationPredictions
        - Keep Cross Validation Predictions
        - Whether to keep the predictions of the cross-validation predictions. This needs to be set to TRUE if running the same AutoML object for repeated runs because CV predictions are required to build additional Stacked Ensemble models in AutoML.
      * - keepCrossValidationFoldAssignment
        - Keep Cross Validation Fold Assignment
        - Whether to keep cross-validation assignments.
      * - distribution
        - Distribution
        - Distribution function.)
      * - tweediePower
        - Tweedie Power
        - Tweedie power for Tweedie regression, must be between 1 and 2.
      * - quantileAlpha
        - Quantile Alhpa
        - Desired quantile for Quantile regression, must be between 0 and 1.
      * - huberAlpha
        - Huber Alpha
        - Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
      * - weightCol
        - Weight Column
        - Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0.
      * - offsetCol
        - Offset Column
        - Offset column. This will be added to the combination of columns before applying the link function.
      * - foldCol
        - Fold Column
        - Column with cross-validation fold index assignment per observation.
      * - foldAssignment
        - Fold Assignment
        - Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems.
      * - aucType
        - AUC Type
        - Set default multinomial AUC type.
      * - confusionMatrix
        - Confusion Matrix
        - 
      * - output_confusion_matrix_chart
        - Output Confusion Matrix Chart
        - whether to display confusion matrix chart.
      * - cm_chart_title
        - Confusion Matrix Chart Title
        - Title name to display in Confusion Matrix Chart
      * - cm_chart_description
        - Confusion Matrix Chart Description
        - Description to display in Confusion Matrix CHart
      * - confusionMatrixTargetLegend
        - Confusion Matrix Target Legend
        - Legend name to display for Target in Confusion Matrix
      * - confusionMatrixPredictedLabelLegend
        - Confusion Matrix PredictedLabel Legend
        - Legend name to display for Predicted Label in Confusion Matrix
      * - ROC Curve
        - ROC Curve
        - 
      * - output_roc_curve
        - Output ROC Curve
        - Whether to display confusion matrix chart.
      * - roc_title
        - ROC Curve Chart Title
        - Title name to display in ROC Curve Chart
      * - roc_description
        - ROC Curve Chart Description
        - Add Description for ROC Curve Chart
      * - xlabel
        - X Label
        - X label
      * - ylabel
        - Y Label
        - Y Label
      * - Grid Search
        - Grid Search
        - 
      * - paramKeys
        - Param Name
        - Param Names. eg: l1 ,hidden
      * - paramValues
        - Param Value
        - Enter comma separated values.eg: 0, 1e-5, eg: 50,50;100,100
      * - gridStrategy
        - Grid Search Strategy
        - Strategy to use for model hyperparameter search. Cartesian does exhaustive search; RandomDiscrete searches randomly within given time or model limits.
      * - gridMaxModels
        - Grid Max Models
        - Maximum number of models to build in the grid search (0 for unlimited).
      * - gridMaxRuntimeSecs
        - Grid Max Runtime Seconds
        - Maximum runtime in seconds for the grid search (0 for unlimited).
      * - gridStoppingRounds
        - Grid Stopping Rounds
        - Early stopping based on convergence of the metric during grid search (0 to disable).
      * - gridStoppingTolerance
        - Grid Stopping Tolerance
        - Tolerance for metric-based stopping criterion during grid search.
      * - gridStoppingMetric
        - Grid Stopping Metric
        - Metric to use for early stopping during grid search (AUTO: logloss for classification, deviance for regression).
      * - gridParallelism
        - Grid Parallelism
        - Level of parallelism to use when building models in the grid.
      * - gridSelectBestModelBy
        - Grid Select Best Model By
        - Metric used to select the best model from the grid.


Details
-------
H2O’s Deep Learning is based on a multi-layer feedforward artificial neural network that is trained with stochastic gradient descent using back-propagation. The network can contain a large number of hidden layers consisting of neurons with tanh, rectifier, and maxout activation functions.


More details are available at : http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/deep-learning.html