Experiment settings: Text sequence to sequence¶
Besides having certain common experiment settings with other problem types, the specific settings for a text sequence to sequence experiment are listed and described below.
Dataset Settings¶
Text Column¶
Defines the column name with the input text that H2O Hydrogen Torch will use during model training.
Tokenizer Settings¶
Lowercase¶
Grid search hyperparameter
Determines whether to transform to lower case the text that H2O Hydrogen Torch will observe during the experiment. This setting is turned Off by default.
Note
When turned On, the observed text will always be lowercased before training and prediction. Tuning this setting can potentially lead to a higher accuracy value for certain types of datasets.
Max Length¶
Grid search hyperparameter
Defines the maximum length of the input sequence H2O Hydrogen Torch will use during model training. In other words, this setting specifies the maximum number of tokens an input text is transformed to for model training.
Note
A higher token count will lead to higher memory usage that will slow down training while increasing the probability of obtaining a higher accuracy value.
Label Max Length¶
Defines the maximum length of the target text H2O Hydrogen Torch will use during model training.
Architecture Settings¶
Backbone¶
Grid search hyperparameter
Defines the backbone neural network architecture to train the model.
Note
H2O Hydrogen Torch provides several backbone state-of-the-art neural network architectures for model training. H2O Hydrogen Torch accepts backbone neural network architectures from the Hugging Face library (enter the architecture name).
Tip
Usually, it is good to use simpler architectures for quicker experiments and larger models when aiming for the highest accuracy.
Gradient Checkpointing¶
Determines whether H2O Hydrogen Torch will activate gradient-checkpointing (GC) when training the model, starting GC reduces the video random access memory (VRAM) footprint at the cost of a longer runtime (an additional forward pass). Turning On GC will enable it during the training process.
Note
Gradient checkpointing is an experimental setting that is not compatible with all backbones. If a backbone is not supported, the experiment will fail, and H2O Hydrogen Torch will inform through the logs that the selected backbone is not compatible with gradient checkpointing. To learn about the backbone setting, see Backbone.
Tip
Activating GC comes at the cost of a longer training time; for that reason, try training without GC first and only activate when experiencing GPU out-of-memory (OOM) errors.
Custom Intermediate Dropout¶
Determines whether to enable a custom dropout rate for intermediate layers in the transformer model. When turned Off, H2O Hydrogen Torch will use the pre-trained backbone's usual 0.1 default dropout rate. See Intermediate Dropout to learn how to define a custom dropout rate.
Intermediate Dropout¶
Defines the custom dropout rate H2O Hydrogen Torch will use for intermediate layers in the transformer model.
Training Settings¶
A text sequence to sequence experiment does not have specific training settings besides those specified in the training settings section of the common experiment settings page.
Prediction Settings¶
Max Length¶
Defines the max length value H2O Hydrogen Torch will use for the generated text.
Note
-
Similar to the Max Length setting in the Tokenizer Settings section, this setting specifies the maximum number of tokens to predict for a given prediction sample.
-
This setting impacts predictions and the evaluation metrics and should depend on the dataset and average output sequence length that is expected to be predicted.
Do Sample¶
Determines whether to sample from the next token distribution instead of choosing the token with the highest probability. If turned On, the next token in a predicted sequence is sampled based on the probabilities. If turned Off, the highest probability is always chosen.
Num Beams¶
Defines the number of beams to use for beam search. Num Beams default value is 1 (a single beam); no beam search.
Note
A higher Num Beams value can increase prediction runtime while potentially improving accuracy.
Temperature¶
Defines the temperature to use for sampling from the next token distribution during validation and inference. In other words, the defined temperature controls the randomness of predictions by scaling the logits before applying softmax. A higher temperature makes the distribution more random.
Note
-
Modify the temperature value if you have the Do Sample setting enabled (On).
-
To learn more about this setting, refer to the following article: How to generate text: using different decoding methods for language generation with Transformers.
Environment Settings¶
A text sequence to sequence experiment does not have specific environment settings besides those specified in the environment settings section of the common experiment settings page.
Logging Settings¶
Number of Texts¶
This setting defines the number of texts to show in the experiment Insights tab.
- Submit and view feedback for this page
- Send feedback about H2O Hydrogen Torch to cloud-feedback@h2o.ai