Skip to content

Experiment settings: Audio regression

Besides having certain common experiment settings with other problem types, the specific settings for an audio regression experiment are listed and described below.

Dataset Settings

Data Folder Test

Defines the folder location of the audios H2O Hydrogen Torch will use to test the model. H2O Hydrogen Torch will load audios from this folder when testing the model. This setting is only available if a test dataframe is selected.

Note

The Data Folder Test setting will appear when you specify a test dataframe using the Test Dataframe setting.

Audio Column

Defines the dataframe column storing the names of audios that H2O Hydrogen Torch will load from the data folder and data folder test when training and testing the model.

Audio Settings

Sample Rate

Defines the sample rate (Hz) that H2O Hydrogen Torch will use to resample the audio files to a given sample rate for training and inference (validation and prediction). This setting becomes useful when audio files in the dataset have mixed samples (22kHz, 32kHz, 44kHz, etc.).

Note

Resampling the audio files to a common sample rate can result in faster training.

Training Chunk Seconds

Defines the chunk size in seconds that H2O Hydrogen Torch will use to sample the audio for training. Shorter audio clips will be padded with zeros.

Max Inference Chunk Seconds

Defines the maximum chunk size in seconds that H2O Hydrogen Torch will use from the audio. Shorter audio clips will be used as-is.

STFT Window Size

Grid search hyperparameter

Defines the window size H2O Hydrogen Torch will use for the Short-time Fourier transform (STFT).

Note

There is a trade-off between time and frequency resolution in spectrograms. Shorter windows improve the temporal resolution at the expense of frequency resolution.

Hop Size

Grid search hyperparameter

Defines the number of audio samples H2O Hydrogen Torch will use between adjacent short-time Fourier transform (STFT) columns.

Note

Smaller values can improve the temporal resolution in the spectrogram by using more overlapping windows.

Mel Frequency Bins

Grid search hyperparameter

Defines the number of frequency bins H2O Hydrogen Torch will use on the Mel scale spectrogram.

Note

Larger values can result in better frequency resolution although they need longer windows.

Minimum Frequency

Defines the minimum frequency (Hz) H2O Hydrogen Torch will use for spectrograms.

Maximum Frequency

Defines the maximum frequency (Hz) H2O Hydrogen Torch will use for spectrograms.

Spectrogram Normalization

Grid search hyperparameter

Defines the transformer to normalize the spectrogram data before training the model.

Augmentation Settings

Mix Audio

Grid search hyperparameter

Defines the audio mix augmentation to use during model training. No mix augmentation is applied if this setting has Disabled selected. Mixup adds (mixes) two audios based on a random ratio.

Mix Target

Grid search hyperparameter

Defines the target (label) mix augmentation to apply during model training. Ratio is used as the mixed target if disabled is selected.

Options
  • Ratio: Two classification targets will be averaged based on the sample ratio during model training.

  • Min: The minimum of both targets will be taken while ignoring the ratio during model training.

  • Max: The maximum of both targets will be taken while ignoring the ratio during model training.

Mix Concentration

Grid search hyperparameter

Defines the concentration parameter value of the Beta probability distribution to generate mix ratios. A larger value will lead to more equal ratios (50% - 50%) for mixing. Mix concentration is only available when Mixup is selected in the Mix Audio setting.

Mix Probability

Grid search hyperparameter

Defines the probability value to apply mix augmentation. The mix probability value is used for each batch or mix iteration. Mix probability is available when Mixup is selected in the Mix Audio setting.

Example

If the mixing probability is specified as 0.3, mix augmentation will be applied to each batch (or mix iteration) with a probability of 0.3.

Mix Iterations

Grid search hyperparameter

Defines the number of times to apply mix augmentation on each batch. The larger the value, the more images are mixed into a single train sample. Mix iterations is available when you select Mixup in the Mix Audio setting.

Architecture Settings

Backbone

Grid search hyperparameter

Defines the backbone neural network architecture to train the model.

Note

H2O Hydrogen Torch provides several backbone state-of-the-art neural network architectures for model training. H2O Hydrogen Torch accepts backbone neural network architectures from the timm library (enter the architecture name).

Tip

Usually, it is good to use simpler architectures for quicker experiments and larger models when aiming for the highest accuracy.

Pretrained

Defines whether the neural network should start with pre-trained weights. When this setting is On, the training of the neural network will start with a pre-trained model on a generic task. When turn Off, the initial weights of the neural network to train will be random.

Pool

Grid search hyperparameter

Defines the global pooling method before the final fully connected layer that H2O Hydrogen Torch will use in the model architecture. Instead of adding a fully connected layer on top of the feature maps, global pooling is applied to each feature map beforehand.

Dropout

Grid search hyperparameter

This setting defines the dropout rate between the backbone and neck of the model H2O Hydrogen Torch will apply during model training. The dropout rate helps the model generalize better by randomly dropping a share of the neural network connections.

Training Settings

An audio regression experiment does not have specific training settings besides those specified in the training settings section of the common experiment settings page.

Prediction Settings

An audio regression experiment does not have specific prediction settings besides those specified in the prediction settings section of the common experiment settings page.

Environment Settings

An audio regression experiment does not have specific environment settings besides those specified in the environment settings section of the common experiment settings page.

Logging Settings

Number Of Audios

This setting defines the number of audios to show in the experiment Insights tab.


Back to top