Experiment settings: Text metric learning¶

Besides having certain common experiment settings with other problem types, the specific settings for a text metric learning experiment are listed and described below.

Dataset Settings¶

Text Column¶

Defines the column name with the input text that H2O Hydrogen Torch will use during model training.

Tokenizer Settings¶

Lowercase¶

Grid search hyperparameter

Determines whether to transform to lower case the text that H2O Hydrogen Torch will observe during the experiment. This setting is turned Off by default.

Note

When turned On, the observed text will always be lowercased before training and prediction. Tuning this setting can potentially lead to a higher accuracy value for certain types of datasets.

Max Length¶

Grid search hyperparameter

Defines the maximum length of the input sequence H2O Hydrogen Torch will use during model training. In other words, this setting specifies the maximum number of tokens an input text is transformed to for model training.

Note

A higher token count will lead to higher memory usage that will slow down training while increasing the probability of obtaining a higher accuracy value.

Architecture Settings¶

Embedding Size¶

Grid search hyperparameter

Defines the dimensionality H2O Hydrogen Torch will use for the embedding vector representing one sample during model training.

Note

The embedding size has an impact on the granularity of the embedding individual records (embedding calculation) and cosine similarity calculation (a calculation that follows the embedding calculation)
A smaller embeddings size will typically lead to more general embeddings and larger ones to more specific ones.
Tuning the size of the embedding can impact overfitting and underfitting.

Backbone¶

Grid search hyperparameter

Defines the backbone neural network architecture to train the model.

Note

H2O Hydrogen Torch provides several backbone state-of-the-art neural network architectures for model training. H2O Hydrogen Torch accepts backbone neural network architectures from the Hugging Face library (enter the architecture name).

Tip

Usually, it is good to use simpler architectures for quicker experiments and larger models when aiming for the highest accuracy.

Gradient Checkpointing¶

Determines whether H2O Hydrogen Torch will activate gradient-checkpointing (GC) when training the model, starting GC reduces the video random access memory (VRAM) footprint at the cost of a longer runtime (an additional forward pass). Turning On GC will enable it during the training process.

Note

Gradient checkpointing is an experimental setting that is not compatible with all backbones. If a backbone is not supported, the experiment will fail, and H2O Hydrogen Torch will inform through the logs that the selected backbone is not compatible with gradient checkpointing. To learn about the backbone setting, see Backbone.

Tip

Activating GC comes at the cost of a longer training time; for that reason, try training without GC first and only activate when experiencing GPU out-of-memory (OOM) errors.

Pool¶

Grid search hyperparameter

Defines the global pooling method that H2O Hydrogen Torch will apply after the backbone in the model architecture.

Custom Intermediate Dropout¶

Determines whether to enable a custom dropout rate for intermediate layers in the transformer model. When turned Off, H2O Hydrogen Torch will use the pre-trained backbone's usual 0.1 default dropout rate. See Intermediate Dropout to learn how to define a custom dropout rate.

Intermediate Dropout¶

Defines the custom dropout rate H2O Hydrogen Torch will use for intermediate layers in the transformer model.

Dropout¶

Grid search hyperparameter

This setting defines the dropout rate between the backbone and neck of the model H2O Hydrogen Torch will apply during model training. The dropout rate helps the model generalize better by randomly dropping a share of the neural network connections.

Training Settings¶

Arface Margin¶

Grid search hyperparameter

Defines the margin for ArcFace loss; higher values result in a bigger separation of samples.

Note

Tuning this setting can impact the training and quality of embeddings.
This setting depends on the dataset at hand.

Arface Scale¶

Grid search hyperparameter

Defines the margin for ArcFace loss; higher values result in a bigger separation of samples.

Prediction Settings¶

Top K Similar¶

Defines the number (k) of similar predictions to keep for each record during the model training.

Note

Defining this setting impacts output predictions and metrics (metrics that rely on some top-k selection) but not the training process.

Environment Settings¶

A text metric learning experiment does not have specific environment settings besides those specified in the environment settings section of the common experiment settings page.

Logging Settings¶

Number of Texts¶

This setting defines the number of texts to show in the experiment **Insights tab.

Feedback

Submit and view feedback for this page
Send feedback about H2O Hydrogen Torch to cloud-feedback@h2o.ai