Skip to content

Experiment settings: Image object detection

Besides having certain common experiment settings with other problem types, the specific settings for an image object detection experiment are listed and described below.

Model Type

For an image object detection experiment it is possible to specify a model type when defining the experiment's settings. To set the model type, consider the following instructions when defining the experiment settings:

  1. In the Model Type list, select the model that you want to use.

    Note

    • H2O Hydrogen Torch supports the following model types:

    • When defining an image object detection experiment, the selected experience level and model type determines the available settings.

Efficientdet

EfficientDet models are among the most popular models to tackle image object detection. They are using EfficientNet models as a backbone and a weighted bi-directional feature pyramid network (BiFPN) as the feature network.

Note

EfficientDet is the default model type for image object detection in H2O Hydrogen Torch. To learn more about EfficientDet, see EfficientDet: Scalable and Efficient Object Detection.

Faster Rcnn

Faster Region-based Convolutional Neural Networks (FasterRCNN) is an advancement of classical Region-based Convolutional Neural Networks (RCNN) architectures, so-called region-based convolutional neural networks. The core idea is to apply selective search to extract regions of interest from an image, where each ROI might represent a bounding box of an object. Each region of interest (ROI) is fed through a neural network to produce output features used to classify the type of object. A FasterRCNN shares full-image convolutional features with the detection network and thus enables nearly cost-free region proposals, significantly improving the training and inference process compared to classical RCNN or Fast RCNN networks.

Note

Fcos

Both EfficientDet and FasterRCNN are so-called anchor-based object detection models. A fully convolutional one-stage object detector (FCOS) is a fully convolutional one-stage object detector to solve object detection per pixel. Similar to how semantic segmentation models operate. FOCS is anchor box and proposal free.

Note

Dataset Settings

Data Folder

Defines the folder location of the images to use for the experiment. When the experiment is running, H2O Hydrogen Torch will load images from this folder.

Data Folder Test

Defines the folder location of the images H2O Hydrogen Torch will use to test the model. H2O Hydrogen Torch will load images from this folder when testing the model. This setting is only available if a test dataframe is selected.

Note

The Data Folder Test setting will appear when you specify a test dataframe using the Test Dataframe setting.

Class Name Column

Defines the dataset column containing a list of class names that H2O Hydrogen Torch will use for each bounding box.

X Min Column

Defines the dataset column containing a list of minimum X positions H2O Hydrogen Torch will use for each bounding box.

Y Min Column

Defines the dataset column containing a list of minimum Y positions H2O Hydrogen Torch will use for each bounding box.

X Max Column

Defines the dataset column containing a list of maximum X positions H2O Hydrogen Torch will use for each bounding box.

Y Max Column

Defines the dataset column containing a list of maximum Y positions H2O Hydrogen Torch will use for each bounding box.

Image Column

Defines the dataframe column storing the names of images that H2O Hydrogen Torch will load from the data folder and data folder test when training and testing the model.

Image Settings

Image Width

Defines the width H2O Hydrogen Torch will use to rescale the images for training and predictions.

Note

Depending on the original image size, a bigger width can generate a higher accuracy value.

Image Height

Defines the height H2O Hydrogen Torch will use to rescale the images for training and predictions.

Note

Depending on the original image size, a bigger height can generate a higher accuracy value.

Image Channels

Defines the number of channels the train images contain.

Note

  • Typically images have three input channels (red, green, and blue (RGB)), but grayscale images have only 1. When you provide image data in a NumPy data format, any number of channels is allowed. For this reason, data scientists can specify the number of channels.

  • The defined number of channels will also refer to the provided validation and test datasets.

Image Normalization

Grid search hyperparameter

Defines the transformer to normalize the image data before training the model.

Note

Usually, state-of-the-art image models normalize the training images by scaling values of each of the input channels to predefined means and standard deviations.

Augmentation Settings

Augmentations Strategy

Grid search hyperparameter

Defines the augmentation strategy to apply to the input images. Soft, Medium, and Hard values correspond to the strength of the augmentations to apply.

Options
  • Soft: The Soft strategy applies image Resize and random HorizontalFlip during model training while applying image Resize during model inference.

  • Medium: The Medium strategy adds ShiftScaleRotate and CoarseDropout to the list of the train augmentations.

  • Hard: The Hard strategy applies RandomResizedCrop (instead of Resize) during model training while adding RandomBrightnessContrast to the list of train augmentations.

  • Custom: The Custom strategy allows users to use their own augmentations that can be defined in the following two settings:

Note

Augmentations are ways to modify train images while keeping the target values valid, such as flipping the image or adding noise. Distorting training images do not influence the expected prediction of the model but enrich the training data. Augmentations help generalize the model better and improve its accuracy.

Custom Train Augmentations

Defines a list of augmentations to use for the train data. The format is a resulting .json of the albumentations.save() function call from Albumentations library. IMAGE_HEIGHT and IMAGE_WIDTH placeholders can be used to utilize image dimensions from the experiment configuration.

Note

Augmentations are ways to modify train images while keeping the target values valid, such as flipping the image or adding noise. Distorting training images do not influence the expected prediction of the model but enrich the training data. Augmentations help generalize the model better and improve its accuracy. Augmentations are applied to every image at each epoch with the provided probability.

Custom Inference Augmentations

Defines a list of inference augmentations to be applied to the test and validation data. The format is a resulting .json of the albumentations.save() function call from Albumentations library. IMAGE_HEIGHT and IMAGE_WIDTH placeholders can be used to utilize image dimensions from the experiment configuration.

Note

Inference augmentations serve the same purpose as training augmentations, but the difference is that inference augmentations are applied to validation and test data. Typically, inference augmentations only contain resizing or very simple augmentations.

Mix Image

Grid search hyperparameter

Defines the image mix augmentation to use during model training. If this setting has Disabled selected, no mix augmentation is applied. Mixup and Cutmix options correspond to the mix augmentation to apply:

Options

Note

In particular, for image object detection, for the Mixup augmentation, H2O Hydrogen Torch uses the union of all the target boxes in mixed images. In contrast, for the Cutmix augmentation, H2O Hydrogen Torch uses the target boxes from the corresponding region from each image. Also, H2O Hydrogen Torch cuts out and replaces only the corners of the images with a patch from another image during the Cutmix augmentation.

Mix Concentration

Grid search hyperparameter

Defines the concentration parameter value of the Beta probability distribution to generate mix ratios. A larger value will lead to more equal ratios (50% - 50%) for mixing. Mix concentration is only available when Mixup is selected in the Mix Image setting.

Mix Probability

Grid search hyperparameter

Defines the probability value to apply mix augmentation. The mix probability value is used for each batch or mix iteration. Mix probability is available when Mixup is selected in the Mix Image setting.

Example

If the mixing probability is specified as 0.3, mix augmentation will be applied to each batch (or mix iteration) with a probability of 0.3.

Mix Iterations

Grid search hyperparameter

Defines the number of times to apply mix augmentation on each batch. The larger the value, the more images are mixed into a single train sample. Mix iterations is available when you select Mixup in the Mix Image setting.

Architecture Settings

Backbone

Grid search hyperparameter

Defines the backbone neural network architecture to train the model.

Note

H2O Hydrogen Torch provides several backbone state-of-the-art neural network architectures for model training. When you select Faster RCnn or Fcos as the model type for the experiment, you can input any architecture name from the timm library.

Tip

Usually, it is good to use simpler architectures for quicker experiments and larger models when aiming for the highest accuracy.

Pretrained

Determines whether to use a pre-trained backbone model for the experiment. By default, this setting is turned On; therefore, the object detection model uses a pre-trained backbone model trained on a generic task to encode an image. When turned Off, H2O Hydrogen Torch assigns the initial weight values random values.

Drop Path Rate

Defines the drop path rate for the Backbone to use during training. The drop path rate prevents co-adaptation of parallel paths in networks, similar to how dropout prevents co-adaption of activations. If set to Default, it will pick the default setting for the respective backbone.

Note

This setting is available when Efficientdet is selected as the model type for the experiment.

Anchor Num Scales

Defines the number of anchor scales to use for each anchor box. You may want to change this to work with more fine-grained scales. Note that changing this setting will reset the head of the pre-trained model; in most use cases, it is recommended to use the default value.

Note

This setting is available when Efficientdet is selected as the model type for the experiment.

Anchor Scale

Defines the general scale factor for all anchor boxes; you may want to change this if your dataset contains a large amount of particularly small or large boxes.

Note

This setting is available when Efficientdet is selected as the model type for the experiment.

Anchor Aspect Ratios

Defines the different anchor aspect ratios for anchor boxes; in the best case, the selected anchor aspect ratios should match the default shapes in the dataset. Note that changing this setting will reset the head of the pre-trained model: in most use cases, it is recommended to use the default value.

Note

This setting is available when Efficientdet is selected as the model type for the experiment.

Anchor IOU Match Threshold

Defines the IoU threshold for matching anchor boxes. In particular, the IoU threshold is used to determine whether an anchor box matches a ground truth box.

Example

If you set the Anchor IoU Match Threshold to 0.5, the anchor box will only match a ground truth box if the IoU is greater than 50%.

In other words, the IoU threshold determines positive labels for anchors.

Note

This setting is available when Efficientdet is selected as the model type for the experiment.

Num Layers

Specifies the number of final layers from the backbone to be used as feature maps for the model. A larger number means that more final layers of the backbone are extracted and used for the feature pyramid network.

Tip

Tuning this setting can be helpful for the final performance of the trained model.

Note

This setting is available when Faster RCnn or Fcos is selected as the model type for the experiment.

Fpn Out Channels

The number of channels out in the feature pyramid network. The default value works very well in practice, but increasing or decreasing it can help with under-or overfitting.

Note

This setting is available when Faster RCnn or Fcos is selected as the model type for the experiment.

Training Settings

Box Loss Weight

Defines the weight of the box loss in EfficientDet (a type of object detection model); it is used to balance the loss of the bounding box regression and classification.

Note

This setting is available when Efficientdet is selected as the model type for the experiment.

Focal Cls Loss Alpha

Defines the alpha hyperparameter value in the focal class loss function; for more information, refer to the following paper: Focal Loss for Dense Object Detection.

Note

This setting is available when Efficientdet is selected as the model type for the experiment.

Focal Cls Loss Gamma

Defines the gamma hyperparameter value in the focal class loss function; for more information, refer to the following paper: Focal Loss for Dense Object Detection.

Note

This setting is available when Efficientdet is selected as the model type for the experiment.

Prediction Settings

Metric IoU Threshold

Defines the Intersection Over Union (IoU) threshold to calculate the selected metric for image object detection.

Note

When calculating metrics, predicted bounding boxes with an IoU (with the true boxes) above the specified IoU threshold will be treated as true positives.

Nms Iou Threshold

Defines the Intersection Over Union (IoU) threshold when calculating post-processing non-maximum suppression (NMS).

Note

Non-maximum suppression (NMS) is a post-processing step that reduces the number of bounding boxes predicted by the model. The NMS algorithm will remove overlap boxes based on the selected IoU threshold. NMS will keep the higher scoring box.

Max Det Per Image

Defines the maximum number of detections per image that the model will return.

Probability Threshold

Defines the Probability Threshold that will result on predicted boxes with confidence larger than the defined threshold to be added to the validation or test .csv files that come with the model predictions.

Environment Settings

An image object detection experiment does not have specific environment settings besides those specified in the environment settings section of the common experiment settings page.

Logging Settings

Number of Images

This setting defines the number of images to show in the experiment Insights tab.


Back to top