Common experiment settings¶

All supported problem types have certain common experiment settings; below, they are listed and described in their respective categories.

Note

The settings for an experiment in H2O Hydrogen Torch are displayed and divided into the following categories: Dataset, Augmentation, Architecture, Training, Prediction, Environment, Tokenizer and Logging.

Dataset settings¶

Validation strategy¶

Specifies the validation strategy H2O Hydrogen Torch will use for the experiment.

Note

To properly assess the performance of your trained models, it is common practice to evaluate it on separate holdout data that the model has not seen during training. H2O Hydrogen Torch allows you to specify different strategies for this task fitting your needs.

Options

K-fold cross-validation

Splits the data using the provided optional fold column in the train data or performs an automatic 5-fold cross-validation.
Grouped k-fold cross validation

Allows to specify a group column based on which the data is split into folds.
Custom holdout validation:

Specifies a separate holdout dataframe.

Folds¶

Defines the validation folds in case of cross-validation; a separate model is trained for each value selected. Each model will use the corresponding part of the data as a holdout sample to assess performance while the model is fitted to the rest of the records from the training dataframe. As a result, folds estimate how the model will perform in general when used to make predictions on data not used during model training.

Note

If a column with the name fold is present in the train dataframe, H2O Hydrogen Torch will use the fold column values for folding; otherwise, a simple 5-fold (K-fold) will be applied.
H2O Hydrogen Torch allows running experiments on single folds for faster experimenting and multiple folds to gain more trust in the model's generalization and performance capabilities.
The Folds setting will only be available if Custom holdout validation is not selected as the Validation Strategy.

Group fold column¶

Defines an optional dataset column to run Group-K-fold on. In the case of Group-K-fold, unique elements of the group fold column will always be within the same fold (the same group will not appear in two different folds).

Note

The Group Fold Column needs to contain at least five unique values; if it's the case, the group 5-fold will be used for validation.
The Group Fold Column setting can be helpful if you want to emulate model performance on unseen data groups, such as new customer data. In this case, you do not want to train your model on samples of the same customers you are also evaluating.
The Group Fold Column setting will only be available if Grouped k-fold cross validation is selected as the Validation Strategy.

Train dataframe¶

Defines a .csv or .pq file containing a dataframe with training records that H2O Hydrogen Torch will use to train the model.

Note

The records will be combined into mini-batches when training the model.
If a validation dataframe is provided, a fold column is not needed in the train dataframe.

Validation dataframe¶

Defines a .csv or .pq file containing a dataframe with validation records that H2O Hydrogen Torch will use to evaluate the model during training.

Note

Setting a separate holdout validation dataframe requires the Validation Strategy to be set to Custom holdout validation. In this case, H2O Hydrogen Torch will fully respect the choice of a separate validation dataframe and will not perform any internal cross-validation. In other words, the model is trained on the full provided train dataframe, and model performance is evaluated on the provided validation dataframe.
The validation dataframe should have the same format as the train dataframe but does not require a fold column.

Test dataframe¶

Defines a .csv or .pq file containing a dataframe with test records that H2O Hydrogen Torch will use to test the model.

Note

The test dataframe should have the same format as the train dataframe but does not require a label column.

Label columns¶

Defines the name(s) of the dataframe column(s) that refer to the target value(s) H2O Hydrogen Torch will aim to predict.

Note

It can be more than one label column, and therefore, the target value to predict can be single or multi-column.
Image classification supports multiclass and multilabel classification.

Data sample¶

Modifies the percentage of the data to use for the experiment. The default percentage is 100% (1).

Note

Changing the default value can significantly increase the training speed. Still, it might lead to a substantially poor accuracy value. As well, modifying the data samples' default value will downsample not only training but also validation. Therefore, a data scientist cannot compare a model built on modified data samples with an experiment run on complete data (100%).

Image settings¶

All supported problem types don't have any common image experiment settings.

Augmentation settings¶

All supported problem types don't have any common augmentation experiment settings.

Architecture settings¶

All supported problem types don't have any common architecture experiment settings.

Training settings¶

Loss function¶

(Grid search hyperparameter)

Defines the loss function H2O Hydrogen Torch will use during model training. The loss function is a differentiable function measuring the prediction error. The model will use gradients of the loss function to update the model weights during training.

Epochs¶

(Grid search hyperparameter)

Defines the number of epochs to train the model. In other words, it specifies the number of times the learning algorithm will go through the entire training dataset.

Note

The Epochs setting is an important setting to tune because it balances under- and overfitting.
The learning rate highly impacts the optimal value of the epochs.

Batch size¶

(Grid search hyperparameter)

Defines the number of training examples a mini-batch will use during an iteration of the training model to estimate the error gradient before updating the model weights. Batch Size defines the batch size used per a single GPU.

Note

During model training, the training data is packed into mini-batches of a fixed size.

Automatically adjust batch size¶

If this setting is turned On, H2O Hydrogen Torch will check whether the Batch Size specified fits into the GPU memory. If a GPU out-of-memory (OOM) error occurs, H2O Hydrogen Torch will automatically decrease the Batch Size by a factor of 2 units until it fits into the GPU memory or Batch Size equals 1.

Learning rate¶

(Grid search hyperparameter)

Defines the learning rate H2O Hydrogen Torch will use when training the model, specifically when updating the neural network's weights. The learning rate is the speed at which the model updates its weights after processing each mini-batch of data.

Note

Learning rate is an important setting to tune as it balances under- and overfitting.
The number of epochs highly impacts the optimal value of the learning rate.

Schedule¶

(Grid search hyperparameter)

Defines the learning rate schedule H2O Hydrogen Torch will use during model training. Specifying a learning rate schedule will prevent the learning rate from staying the same. Instead, a learning rate schedule will cause the learning rate to change over iterations, typically decreasing the learning rate to achieve a better model performance and training convergence.

Warmup epochs¶

Defines the number of epochs to warm up the learning rate where the learning rate should increase linearly from 0 to the desired learning rate.

Optimizer¶

(Grid search hyperparameter)

Defines the algorithm or method (optimizer) to use for model training. The selected algorithm or method defines how the model should change the attributes of the neural network, such as weights and learning rate. In general, optimizers solve optimization problems and make more accurate updates to attributes to reduce learning losses.

Weight decay¶

Defines the weight decay that H2O Hydrogen Torch will use for the optimizer during model training.

Note

Weight decay is a regularization technique that adds an L2 norm of all model weights to the loss function while increasing the probability of improving the model generalization.

Gradient clip¶

Defines the gradient clip that H2O Hydrogen Torch will use during model training. Defaults to -1, no clipping. When a value (not -1) is specified, the value will be used as an upper limit for gradients, calculated per batch.

Note

This setting can help model convergence when extreme gradient values cause high volatility of weight updates.

Grad accumulation¶

Defines the number of gradient accumulations before H2O Hydrogen Torch updates the neural network weights during model training.

Note

Grad accumulation can be beneficial if only small batches are selected for training. With gradient accumulation, the loss and gradients are calculated after each batch, but it waits for the selected accumulations before updating the model weights. You can control the batch size through the Batch Size setting.
Changing the default value of Grad Accumulation might require adjusting the learning rate and batch size.

Save best checkpoint¶

Determines if H2O Hydrogen Torch should save the model weights of the epoch exhibiting the best validation metric. When turned On, H2O Hydrogen Torch saves the model weights for the epoch exhibiting the best validation metric. When turned Off, H2O Hydrogen Torch saves the model weights after the last epoch is executed.

Note

This setting should be turned On with care as it has the potential to lead to overfitting of the validation data.
The default goal should be to attempt to tune models so that the last or very last epoch is the best epoch.
Suppose an evident decline for later epochs is observed in logging. In that case, it is usually better to adjust hyperparameters, such as reducing the number of epochs or increasing regularization, instead of turning this setting On.

Evaluation epochs¶

Defines the number of epochs H2O Hydrogen Torch will use before each validation loop for model training. In other words, it determines the frequency (in a number of epochs) to run the model evaluation on the validation data.

Note

Increasing the number of Evaluation Epochs can speed up an experiment.
The Evaluation epochs setting is available only if the following setting is turned Off: Save Best Checkpoint.

Calculate train metric¶

Determines whether the model metric should also be calculated for the training data at the end of the training. When On, the model metric will also be calculated for the training data. The resulting values will not indicate the true model performance because they will be based on H2O Hydrogen Torch's identical data records for model training but can give insights into over/underfitting.

Train validation data¶

Defines whether the model should use the entire train and validation dataset during model training. When turned On, H2O Hydrogen Torch will use the whole train dataset and validation data to train the model.

Note

H2O Hydrogen Torch will also evaluate the model on the provided validation fold. Validation will always be only on the provided validation fold.
H2O Hydrogen Torch will use both datasets for model training if you provide a train and validation dataset.
- To define a training dataset, use the Train Dataframe setting. For more information, see Train dataframe.
- To define a validation dataset, use the Validation Dataframe setting. For more information, see Validation dataframe.
The Train validation data setting is only available if you turned the Save best checkpoint setting Off.
- See Save best checkpoint to learn more about the Save best checkpoint setting.
Turning On the Train validation data setting should produce a model that you can expect to perform better because H2O Hydrogen Torch trained the model on more data. Thought, also note that using the entire train dataset and out-of-fold validation dataset generally causes the model's accuracy to be overstated as information from the validation data is incorporated into the model during the training process.

Example

If you have five folds and set fold 0 as validation, H2O Hydrogen Torch will usually train on folds 1-4 and report on fold 0. With train validation data turned On, we can add fold 0 to the training, but H2O Hydrogen Torch will still report its accuracy. As a result, it will be overstated for fold 0 but should be better for any unseen (test) data/production scenarios. For that reason, you usually want to consider this setting after running your experiments and deciding on models.

Drop last batch¶

H2O Hydrogen Torch drops the last incomplete batch during model training when turned On.

Note

H2O Hydrogen Torch groups the train data into mini-batches of equal size during the training process, but the last batch can have fewer records than the others. Not dropping the last batch can lead to a less robust gradient estimation while causing a more volatile training step.

Prediction settings¶

Metric¶

Defines the evaluation metric to use to evaluate the model's performance. During and after an experiment, graphs will be available reflecting the selected evaluation metric.

Note

Usually, the evaluation metric should reflect the quantitative way of assessing the model's value for the corresponding use case.

Environment settings¶

GPUs¶

Determines the list of GPUs H2O Hydrogen Torch can use for the experiment. GPUs are listed by name, referring to their system ID (starting from 1).

Number of GPUs per run¶

Defines the number of GPUs to use for a single run when training the model. A single run might represent a single fold or a single grid search run.

Example

If 5 GPUs are available, it will be possible to run a 5-fold cross-validation in parallel using a single GPU per fold.

Note

The available GPUs will be the ones that can be enabled using the GPUs setting.
If the number of GPUs is less than or equal to 1, this setting (Number of GPUs per run ) will not be available.

Mixed precision¶

Determines whether to use mixed-precision training during model training. When turned Off, H2O Hydrogen Torch will not use mixed-precision for training.

Note

Mixed-precision is a technique that helps decrease memory consumption and increases training speed.

Sync batch normalization¶

Determines whether to synchronize batch normalization across GPUs in a distributed data-parallel (DDP) mode. In other words, when turned On, multi-GPU training is enabled to synchronize the batch normalization layers of the model across GPUs. In a nutshell, H2O Hydrogen Torch with multi GPU splits the batch across GPUs, and therefore, when a normalization layer wants to normalize data, it has access only to the part of the batch stored on the device. As a result, it will work out of the box but will give better results if the data in all GPUs is collected to normalize the data of the entire batch.

Note

When turned On, data scientists can expect the training speed to drop slightly while the model's accuracy improves. However, this rarely happens in practice and only occurs under specific problem types and defined batch sizes.

Number of workers¶

Defines the number of workers H2O Hydrogen Torch will use for the DataLoader. In other words, it defines the number of CPU processes to use when reading and loading data to GPUs during model training.

Seed¶

Defines the random seed value that H2O Hydrogen Torch will use during model training. It defaults to -1, an arbitrary value. When the value is modified (not -1), the random seed will allow results to be reproducible—defining a seed aids in obtaining predictable and repeatable results every time. Otherwise, not modifying the default seed value (-1) will lead to random numbers at every invocation.

Logging settings¶

Logger¶

Defines the logger type that H2O Hydrogen Torch will use for model training

Neptune API token¶

Defines the Neptune API token to validate all subsequent Neptune API calls.

Neptune project¶

Defines the Neptune project to access if you selected Neptune in the Logger setting.