Experiment settings: Image object detection¶
Besides having certain common experiment settings with other problem types, the specific settings for an image object detection experiment are listed and described below.
Model Type¶
For an image object detection experiment it is possible to specify a model type when defining the experiment's settings. To set the model type, consider the following instructions when defining the experiment settings:
-
In the Model Type list, select the model that you want to use.
Note
-
H2O Hydrogen Torch supports the following model types:
-
When defining an image object detection experiment, the selected experience level and model type determines the available settings.
-
Efficientdet¶
EfficientDet models are among the most popular models to tackle image object detection. They are using EfficientNet models as a backbone and a weighted bi-directional feature pyramid network (BiFPN) as the feature network.
Note
EfficientDet is the default model type for image object detection in H2O Hydrogen Torch. To learn more about EfficientDet, see EfficientDet: Scalable and Efficient Object Detection.
Faster Rcnn¶
Faster Region-based Convolutional Neural Networks (FasterRCNN) is an advancement of classical Region-based Convolutional Neural Networks (RCNN) architectures, so-called region-based convolutional neural networks. The core idea is to apply selective search to extract regions of interest from an image, where each ROI might represent a bounding box of an object. Each region of interest (ROI) is fed through a neural network to produce output features used to classify the type of object. A FasterRCNN shares full-image convolutional features with the detection network and thus enables nearly cost-free region proposals, significantly improving the training and inference process compared to classical RCNN or Fast RCNN networks.
Note
-
The implementation of FasterRCNNs in H2O Hydrogen Torch enables the selection of a pre-trained vision backbone from an extensive selection.
-
To learn more about FasterRCNN, see Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.
Fcos¶
Both EfficientDet and FasterRCNN are so-called anchor-based object detection models. A fully convolutional one-stage object detector (FCOS) is a fully convolutional one-stage object detector to solve object detection per pixel. Similar to how semantic segmentation models operate. FOCS is anchor box and proposal free.
Note
-
The implementation of FCOS in H2O Hydrogen Torch enables the selection of a pre-trained vision backbone from an extensive selection.
-
To learn more about FCOS, see FCOS: Fully Convolutional One-Stage Object Detection.
Dataset Settings¶
Data Folder¶
Defines the folder location of the images to use for the experiment. When the experiment is running, H2O Hydrogen Torch will load images from this folder.
Data Folder Test¶
Defines the folder location of the images H2O Hydrogen Torch will use to test the model. H2O Hydrogen Torch will load images from this folder when testing the model. This setting is only available if a test dataframe is selected.
Note
The Data Folder Test setting will appear when you specify a test dataframe using the Test Dataframe setting.
Class Name Column¶
Defines the dataset column containing a list of class names that H2O Hydrogen Torch will use for each bounding box.
X Min Column¶
Defines the dataset column containing a list of minimum X positions H2O Hydrogen Torch will use for each bounding box.
Y Min Column¶
Defines the dataset column containing a list of minimum Y positions H2O Hydrogen Torch will use for each bounding box.
X Max Column¶
Defines the dataset column containing a list of maximum X positions H2O Hydrogen Torch will use for each bounding box.
Y Max Column¶
Defines the dataset column containing a list of maximum Y positions H2O Hydrogen Torch will use for each bounding box.
Image Column¶
Defines the dataframe column storing the names of images that H2O Hydrogen Torch will load from the data folder and data folder test when training and testing the model.
Image Settings¶
Image Width¶
Defines the width H2O Hydrogen Torch will use to rescale the images for training and predictions.
Note
Depending on the original image size, a bigger width can generate a higher accuracy value.
Image Height¶
Defines the height H2O Hydrogen Torch will use to rescale the images for training and predictions.
Note
Depending on the original image size, a bigger height can generate a higher accuracy value.
Image Channels¶
Defines the number of channels the train images contain.
Note
-
Typically images have three input channels (red, green, and blue (RGB)), but grayscale images have only 1. When you provide image data in a NumPy data format, any number of channels is allowed. For this reason, data scientists can specify the number of channels.
-
The defined number of channels will also refer to the provided validation and test datasets.
Image Normalization¶
Grid search hyperparameter
Defines the transformer to normalize the image data before training the model.
Note
Usually, state-of-the-art image models normalize the training images by scaling values of each of the input channels to predefined means and standard deviations.
Augmentation Settings¶
Augmentations Strategy¶
Grid search hyperparameter
Defines the augmentation strategy to apply to the input images. Soft, Medium, and Hard values correspond to the strength of the augmentations to apply.
Note
Augmentations are ways to modify train images while keeping the target values valid, such as flipping the image or adding noise. Distorting training images do not influence the expected prediction of the model but enrich the training data. Augmentations help generalize the model better and improve its accuracy.
Custom Train Augmentations¶
Defines a list of augmentations to use for the train data. The format is a resulting .json
of the albumentations.save()
function call from Albumentations library. IMAGE_HEIGHT
and IMAGE_WIDTH
placeholders can be used to utilize image dimensions from the experiment configuration.
Note
Augmentations are ways to modify train images while keeping the target values valid, such as flipping the image or adding noise. Distorting training images do not influence the expected prediction of the model but enrich the training data. Augmentations help generalize the model better and improve its accuracy. Augmentations are applied to every image at each epoch with the provided probability.
Custom Inference Augmentations¶
Defines a list of inference augmentations to be applied to the test and validation data. The format is a resulting .json
of the albumentations.save()
function call from Albumentations library. IMAGE_HEIGHT
and IMAGE_WIDTH
placeholders can be used to utilize image dimensions from the experiment configuration.
Note
Inference augmentations serve the same purpose as training augmentations, but the difference is that inference augmentations are applied to validation and test data. Typically, inference augmentations only contain resizing or very simple augmentations.
Mix Image¶
Grid search hyperparameter
Defines the image mix augmentation to use during model training. If this setting has Disabled selected, no mix augmentation is applied. Mixup and Cutmix options correspond to the mix augmentation to apply:
Note
In particular, for image object detection, for the Mixup augmentation, H2O Hydrogen Torch uses the union of all the target boxes in mixed images. In contrast, for the Cutmix augmentation, H2O Hydrogen Torch uses the target boxes from the corresponding region from each image. Also, H2O Hydrogen Torch cuts out and replaces only the corners of the images with a patch from another image during the Cutmix augmentation.
Mix Concentration¶
Grid search hyperparameter
Defines the concentration parameter value of the Beta probability distribution to generate mix ratios. A larger value will lead to more equal ratios (50% - 50%) for mixing. Mix concentration is only available when Mixup is selected in the Mix Image setting.
Mix Probability¶
Grid search hyperparameter
Defines the probability value to apply mix augmentation. The mix probability value is used for each batch or mix iteration. Mix probability is available when Mixup is selected in the Mix Image setting.
Example
If the mixing probability is specified as 0.3, mix augmentation will be applied to each batch (or mix iteration) with a probability of 0.3.
Mix Iterations¶
Grid search hyperparameter
Defines the number of times to apply mix augmentation on each batch. The larger the value, the more images are mixed into a single train sample. Mix iterations is available when you select Mixup in the Mix Image setting.
Architecture Settings¶
Backbone¶
Grid search hyperparameter
Defines the backbone neural network architecture to train the model.
Note
H2O Hydrogen Torch provides several backbone state-of-the-art neural network architectures for model training. When you select Faster RCnn or Fcos as the model type for the experiment, you can input any architecture name from the timm library.
Tip
Usually, it is good to use simpler architectures for quicker experiments and larger models when aiming for the highest accuracy.
Pretrained¶
Determines whether to use a pre-trained backbone model for the experiment. By default, this setting is turned On; therefore, the object detection model uses a pre-trained backbone model trained on a generic task to encode an image. When turned Off, H2O Hydrogen Torch assigns the initial weight values random values.
Drop Path Rate¶
Defines the drop path rate for the Backbone to use during training. The drop path rate prevents co-adaptation of parallel paths in networks, similar to how dropout prevents co-adaption of activations. If set to Default, it will pick the default setting for the respective backbone.
Note
This setting is available when Efficientdet is selected as the model type for the experiment.
Anchor Num Scales¶
Defines the number of anchor scales to use for each anchor box. You may want to change this to work with more fine-grained scales. Note that changing this setting will reset the head of the pre-trained model; in most use cases, it is recommended to use the default value.
Note
This setting is available when Efficientdet is selected as the model type for the experiment.
Anchor Scale¶
Defines the general scale factor for all anchor boxes; you may want to change this if your dataset contains a large amount of particularly small or large boxes.
Note
This setting is available when Efficientdet is selected as the model type for the experiment.
Anchor Aspect Ratios¶
Defines the different anchor aspect ratios for anchor boxes; in the best case, the selected anchor aspect ratios should match the default shapes in the dataset. Note that changing this setting will reset the head of the pre-trained model: in most use cases, it is recommended to use the default value.
Note
This setting is available when Efficientdet is selected as the model type for the experiment.
Anchor IOU Match Threshold¶
Defines the IoU threshold for matching anchor boxes. In particular, the IoU threshold is used to determine whether an anchor box matches a ground truth box.
Example
If you set the Anchor IoU Match Threshold to 0.5, the anchor box will only match a ground truth box if the IoU is greater than 50%.
In other words, the IoU threshold determines positive labels for anchors.
Note
This setting is available when Efficientdet is selected as the model type for the experiment.
Num Layers¶
Specifies the number of final layers from the backbone to be used as feature maps for the model. A larger number means that more final layers of the backbone are extracted and used for the feature pyramid network.
Tip
Tuning this setting can be helpful for the final performance of the trained model.
Note
This setting is available when Faster RCnn or Fcos is selected as the model type for the experiment.
Fpn Out Channels¶
The number of channels out in the feature pyramid network. The default value works very well in practice, but increasing or decreasing it can help with under-or overfitting.
Note
This setting is available when Faster RCnn or Fcos is selected as the model type for the experiment.
Training Settings¶
Box Loss Weight¶
Defines the weight of the box loss in EfficientDet (a type of object detection model); it is used to balance the loss of the bounding box regression and classification.
Note
This setting is available when Efficientdet is selected as the model type for the experiment.
Focal Cls Loss Alpha¶
Defines the alpha hyperparameter value in the focal class loss function; for more information, refer to the following paper: Focal Loss for Dense Object Detection.
Note
This setting is available when Efficientdet is selected as the model type for the experiment.
Focal Cls Loss Gamma¶
Defines the gamma hyperparameter value in the focal class loss function; for more information, refer to the following paper: Focal Loss for Dense Object Detection.
Note
This setting is available when Efficientdet is selected as the model type for the experiment.
Prediction Settings¶
Metric IoU Threshold¶
Defines the Intersection Over Union (IoU) threshold to calculate the selected metric for image object detection.
Note
When calculating metrics, predicted bounding boxes with an IoU (with the true boxes) above the specified IoU threshold will be treated as true positives.
Nms Iou Threshold¶
Defines the Intersection Over Union (IoU) threshold when calculating post-processing non-maximum suppression (NMS).
Note
Non-maximum suppression (NMS) is a post-processing step that reduces the number of bounding boxes predicted by the model. The NMS algorithm will remove overlap boxes based on the selected IoU threshold. NMS will keep the higher scoring box.
Max Det Per Image¶
Defines the maximum number of detections per image that the model will return.
Probability Threshold¶
Defines the Probability Threshold that will result on predicted boxes with confidence larger than the defined threshold to be added to the validation or test .csv files that come with the model predictions.
Environment Settings¶
An image object detection experiment does not have specific environment settings besides those specified in the environment settings section of the common experiment settings page.
Logging Settings¶
Number of Images¶
This setting defines the number of images to show in the experiment Insights tab.
- Submit and view feedback for this page
- Send feedback about H2O Hydrogen Torch to cloud-feedback@h2o.ai