Experiment settings: Image object detection¶

Besides having certain common experiment settings with other problem types, the specific settings for an image object detection experiment are listed and described below.

Model Type¶

For an image object detection experiment it is possible to specify a model type when defining the experiment's settings. To set the model type, consider the following instructions when defining the experiment settings:

In the Model Type list, select the model that you want to use.
Note
- H2O Hydrogen Torch supports the following model types:
- When defining an image object detection experiment, the selected experience level and model type determines the available settings.

Efficientdet¶

EfficientDet models are among the most popular models to tackle image object detection. They are using EfficientNet models as a backbone and a weighted bi-directional feature pyramid network (BiFPN) as the feature network.

Note

EfficientDet is the default model type for image object detection in H2O Hydrogen Torch. To learn more about EfficientDet, see EfficientDet: Scalable and Efficient Object Detection.

Faster Rcnn¶

Faster Region-based Convolutional Neural Networks (FasterRCNN) is an advancement of classical Region-based Convolutional Neural Networks (RCNN) architectures, so-called region-based convolutional neural networks. The core idea is to apply selective search to extract regions of interest from an image, where each ROI might represent a bounding box of an object. Each region of interest (ROI) is fed through a neural network to produce output features used to classify the type of object. A FasterRCNN shares full-image convolutional features with the detection network and thus enables nearly cost-free region proposals, significantly improving the training and inference process compared to classical RCNN or Fast RCNN networks.

Note

The implementation of FasterRCNNs in H2O Hydrogen Torch enables the selection of a pre-trained vision backbone from an extensive selection.
To learn more about FasterRCNN, see Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.

Fcos¶

Both EfficientDet and FasterRCNN are so-called anchor-based object detection models. A fully convolutional one-stage object detector (FCOS) is a fully convolutional one-stage object detector to solve object detection per pixel. Similar to how semantic segmentation models operate. FOCS is anchor box and proposal free.

Note

The implementation of FCOS in H2O Hydrogen Torch enables the selection of a pre-trained vision backbone from an extensive selection.
To learn more about FCOS, see FCOS: Fully Convolutional One-Stage Object Detection.

Dataset Settings¶

Data Folder¶

Defines the folder location of the images to use for the experiment. When the experiment is running, H2O Hydrogen Torch will load images from this folder.

Data Folder Test¶

Defines the folder location of the images H2O Hydrogen Torch will use to test the model. H2O Hydrogen Torch will load images from this folder when testing the model. This setting is only available if a test dataframe is selected.

Note

The Data Folder Test setting will appear when you specify a test dataframe using the Test Dataframe setting.

Class Name Column¶

Defines the dataset column containing a list of class names that H2O Hydrogen Torch will use for each bounding box.

X Min Column¶

Defines the dataset column containing a list of minimum X positions H2O Hydrogen Torch will use for each bounding box.

Y Min Column¶

Defines the dataset column containing a list of minimum Y positions H2O Hydrogen Torch will use for each bounding box.

X Max Column¶

Defines the dataset column containing a list of maximum X positions H2O Hydrogen Torch will use for each bounding box.

Y Max Column¶

Defines the dataset column containing a list of maximum Y positions H2O Hydrogen Torch will use for each bounding box.

Image Column¶

Defines the dataframe column storing the names of images that H2O Hydrogen Torch will load from the data folder and data folder test when training and testing the model.

Image Settings¶

Image Width¶

Defines the width H2O Hydrogen Torch will use to rescale the images for training and predictions.

Note

Depending on the original image size, a bigger width can generate a higher accuracy value.

Image Height¶

Defines the height H2O Hydrogen Torch will use to rescale the images for training and predictions.

Note

Depending on the original image size, a bigger height can generate a higher accuracy value.

Image Channels¶

Defines the number of channels the train images contain.

Note

Typically images have three input channels (red, green, and blue (RGB)), but grayscale images have only 1. When you provide image data in a NumPy data format, any number of channels is allowed. For this reason, data scientists can specify the number of channels.
The defined number of channels will also refer to the provided validation and test datasets.

Image Normalization¶

Grid search hyperparameter

Defines the transformer to normalize the image data before training the model.

Note

Usually, state-of-the-art image models normalize the training images by scaling values of each of the input channels to predefined means and standard deviations.

Augmentation Settings¶

Augmentations Strategy¶

Grid search hyperparameter

Defines the augmentation strategy to apply to the input images. Soft, Medium, and Hard values correspond to the strength of the augmentations to apply.

Options

Soft: The Soft strategy applies image Resize and random HorizontalFlip during model training while applying image Resize during model inference.
Medium: The Medium strategy adds ShiftScaleRotate and CoarseDropout to the list of the train augmentations.
Hard: The Hard strategy applies RandomResizedCrop (instead of Resize) during model training while adding RandomBrightnessContrast to the list of train augmentations.
Custom: The Custom strategy allows users to use their own augmentations that can be defined in the following two settings:
- Custom train augmentations
- Custom inference augmentations

Note

Augmentations are ways to modify train images while keeping the target values valid, such as flipping the image or adding noise. Distorting training images do not influence the expected prediction of the model but enrich the training data. Augmentations help generalize the model better and improve its accuracy.

Custom Train Augmentations¶

Defines a list of augmentations to use for the train data. The format is a resulting .json of the albumentations.save() function call from Albumentations library. IMAGE_HEIGHT and IMAGE_WIDTH placeholders can be used to utilize image dimensions from the experiment configuration.

Note

Augmentations are ways to modify train images while keeping the target values valid, such as flipping the image or adding noise. Distorting training images do not influence the expected prediction of the model but enrich the training data. Augmentations help generalize the model better and improve its accuracy. Augmentations are applied to every image at each epoch with the provided probability.

Custom Inference Augmentations¶

Defines a list of inference augmentations to be applied to the test and validation data. The format is a resulting .json of the albumentations.save() function call from Albumentations library. IMAGE_HEIGHT and IMAGE_WIDTH placeholders can be used to utilize image dimensions from the experiment configuration.

Note

Inference augmentations serve the same purpose as training augmentations, but the difference is that inference augmentations are applied to validation and test data. Typically, inference augmentations only contain resizing or very simple augmentations.

Mix Image¶

Grid search hyperparameter

Defines the image mix augmentation to use during model training. If this setting has Disabled selected, no mix augmentation is applied. Mixup and Cutmix options correspond to the mix augmentation to apply:

Options

Mixup: Mixup overlays (mixes) two images one on another based on a random ratio. To learn more about this approach, refer to the following article: mixup: BEYOND EMPIRICAL RISK MINIMIZATION.
Cutmix: Cutmix replaces an image region with a patch from another image; the region size is based on a random ratio. To learn more about this approach, refer to the following article: SOLVING LINEAR SYSTEMS OVER TROPICAL SEMIRINGS THROUGH NORMALIZATION METHOD AND ITS APPLICATIONS.

Note

In particular, for image object detection, for the Mixup augmentation, H2O Hydrogen Torch uses the union of all the target boxes in mixed images. In contrast, for the Cutmix augmentation, H2O Hydrogen Torch uses the target boxes from the corresponding region from each image. Also, H2O Hydrogen Torch cuts out and replaces only the corners of the images with a patch from another image during the Cutmix augmentation.

Mix Concentration¶

Grid search hyperparameter

Defines the concentration parameter value of the Beta probability distribution to generate mix ratios. A larger value will lead to more equal ratios (50% - 50%) for mixing. Mix concentration is only available when Mixup is selected in the Mix Image setting.

Mix Probability¶

Grid search hyperparameter

Defines the probability value to apply mix augmentation. The mix probability value is used for each batch or mix iteration. Mix probability is available when Mixup is selected in the Mix Image setting.

Example

If the mixing probability is specified as 0.3, mix augmentation will be applied to each batch (or mix iteration) with a probability of 0.3.

Mix Iterations¶

Grid search hyperparameter

Defines the number of times to apply mix augmentation on each batch. The larger the value, the more images are mixed into a single train sample. Mix iterations is available when you select Mixup in the Mix Image setting.

Architecture Settings¶

Backbone¶

Grid search hyperparameter

Defines the backbone neural network architecture to train the model.

Note

H2O Hydrogen Torch provides several backbone state-of-the-art neural network architectures for model training. When you select Faster RCnn or Fcos as the model type for the experiment, you can input any architecture name from the timm library.

Tip

Usually, it is good to use simpler architectures for quicker experiments and larger models when aiming for the highest accuracy.

Pretrained¶

Determines whether to use a pre-trained backbone model for the experiment. By default, this setting is turned On; therefore, the object detection model uses a pre-trained backbone model trained on a generic task to encode an image. When turned Off, H2O Hydrogen Torch assigns the initial weight values random values.

Drop Path Rate¶

Defines the drop path rate for the Backbone to use during training. The drop path rate prevents co-adaptation of parallel paths in networks, similar to how dropout prevents co-adaption of activations. If set to Default, it will pick the default setting for the respective backbone.

Note

This setting is available when Efficientdet is selected as the model type for the experiment.

Anchor Num Scales¶

Defines the number of anchor scales to use for each anchor box. You may want to change this to work with more fine-grained scales. Note that changing this setting will reset the head of the pre-trained model; in most use cases, it is recommended to use the default value.

Note

This setting is available when Efficientdet is selected as the model type for the experiment.

Anchor Scale¶

Defines the general scale factor for all anchor boxes; you may want to change this if your dataset contains a large amount of particularly small or large boxes.

Note

This setting is available when Efficientdet is selected as the model type for the experiment.

Anchor Aspect Ratios¶

Defines the different anchor aspect ratios for anchor boxes; in the best case, the selected anchor aspect ratios should match the default shapes in the dataset. Note that changing this setting will reset the head of the pre-trained model: in most use cases, it is recommended to use the default value.

Note

This setting is available when Efficientdet is selected as the model type for the experiment.

Anchor IOU Match Threshold¶

Defines the IoU threshold for matching anchor boxes. In particular, the IoU threshold is used to determine whether an anchor box matches a ground truth box.

Example

If you set the Anchor IoU Match Threshold to 0.5, the anchor box will only match a ground truth box if the IoU is greater than 50%.

In other words, the IoU threshold determines positive labels for anchors.

Note

This setting is available when Efficientdet is selected as the model type for the experiment.

Num Layers¶

Specifies the number of final layers from the backbone to be used as feature maps for the model. A larger number means that more final layers of the backbone are extracted and used for the feature pyramid network.

Tip

Tuning this setting can be helpful for the final performance of the trained model.

Note

This setting is available when Faster RCnn or Fcos is selected as the model type for the experiment.

Fpn Out Channels¶

The number of channels out in the feature pyramid network. The default value works very well in practice, but increasing or decreasing it can help with under-or overfitting.

Note

This setting is available when Faster RCnn or Fcos is selected as the model type for the experiment.

Training Settings¶

Box Loss Weight¶

Defines the weight of the box loss in EfficientDet (a type of object detection model); it is used to balance the loss of the bounding box regression and classification.

Note

This setting is available when Efficientdet is selected as the model type for the experiment.

Focal Cls Loss Alpha¶

Defines the alpha hyperparameter value in the focal class loss function; for more information, refer to the following paper: Focal Loss for Dense Object Detection.

Note

This setting is available when Efficientdet is selected as the model type for the experiment.

Focal Cls Loss Gamma¶

Defines the gamma hyperparameter value in the focal class loss function; for more information, refer to the following paper: Focal Loss for Dense Object Detection.

Note

This setting is available when Efficientdet is selected as the model type for the experiment.

Prediction Settings¶

Metric IoU Threshold¶

Defines the Intersection Over Union (IoU) threshold to calculate the selected metric for image object detection.

Note

When calculating metrics, predicted bounding boxes with an IoU (with the true boxes) above the specified IoU threshold will be treated as true positives.

Nms Iou Threshold¶

Defines the Intersection Over Union (IoU) threshold when calculating post-processing non-maximum suppression (NMS).

Note

Non-maximum suppression (NMS) is a post-processing step that reduces the number of bounding boxes predicted by the model. The NMS algorithm will remove overlap boxes based on the selected IoU threshold. NMS will keep the higher scoring box.

Max Det Per Image¶

Defines the maximum number of detections per image that the model will return.

Probability Threshold¶

Defines the Probability Threshold that will result on predicted boxes with confidence larger than the defined threshold to be added to the validation or test .csv files that come with the model predictions.

Environment Settings¶

An image object detection experiment does not have specific environment settings besides those specified in the environment settings section of the common experiment settings page.

Logging Settings¶

Number of Images¶

This setting defines the number of images to show in the experiment Insights tab.

Feedback

Submit and view feedback for this page
Send feedback about H2O Hydrogen Torch to cloud-feedback@h2o.ai