Abstract: Training a machine learning model for object detection
that generalizes well is known to benefit from a training dataset
with diverse examples. However, training datasets usually contain
many repeats of common examples of a class and lack rarely seen
examples. This is due to the process commonly used during human
annotation where a person would proceed sequentially through a
list of images labeling a sufficiently high total number of examples.
Instead, the method presented involves an active process where, after
the initial labeling of several images is completed, the next subset
of images for labeling is selected by an algorithm. This process of
algorithmic image selection and manual labeling continues in an
iterative fashion. The algorithm used for the image selection is a
deep learning algorithm, based on the U-shaped architecture, which
quantifies the presence of unseen data in each image in order to find
images that contain the most novel examples. Moreover, the location
of the unseen data in each image is highlighted, aiding the labeler in
spotting these examples. Experiments performed using semiconductor
wafer data show that labeling a subset of the data, curated by this
algorithm, resulted in a model with a better performance than a
model produced from sequentially labeling the same amount of data.
Also, similar performance is achieved compared to a model trained
on exhaustive labeling of the whole dataset. Overall, the proposed
approach results in a dataset that has a diverse set of examples per
class as well as more balanced classes, which proves beneficial when
training a deep learning model.
References:
[1] A. Braun and A. Borrmann, “Combining inverse photogrammetry and
bim for automated labeling of construction site images for machine
learning,” Automation in Construction, vol. 106, p. 102879, 2019.
[2] G. H. Weber, C. Ophus, and L. Ramakrishnan, Automated Labeling of
Electron Microscopy Images Using Deep Learning. IEEE, 2018.
[3] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international
conference on computer vision, pp. 1440–1448, 2015.
[4] H. Alhammady and K. Ramamohanarao, “Using emerging patterns and
decision trees in rare-class classification,” in Fourth IEEE International
Conference on Data Mining (ICDM’04), pp. 315–318, IEEE, 2004.
[5] L.-C. Chen, S. Fidler, A. L. Yuille, and R. Urtasun, “Beat the mturkers:
Automatic image labeling from weak 3d supervision,” in Proceedings
of CVPR, 2014.
[6] L. Zhang, Y. Tong, and Q. Ji, “Active image labeling and its application
to facial action labeling,” in Proceedings of ECCV, pp. 706–719, 2008.
[7] K. Okuma, E. Brochu, D. G. Lowe, and J. J. Little, “An adaptive
interface for active localization,” in Proceedings of the International
Conference on Computer Vision Theory and Applications, pp. 248–258,
2011.
[8] A. More, “Survey of resampling techniques for improving classification
performance in unbalanced datasets,” arXiv preprint arXiv:1608.06048,
2016.
[9] Q. Dong, S. Gong, and X. Zhu, “Class rectification hard mining for
imbalanced deep learning,” CoRR, vol. abs/1712.03162, 2017.
[10] C. H. Sudre, W. Li, T. Vercauteren, S. Ourselin, and M. J. Cardoso,
“Generalised dice overlap as a deep learning loss function for highly
unbalanced segmentations,” CoRR, vol. abs/1707.03237, 2017.
[11] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks
for biomedical image segmentation,” in International Conference
on Medical image computing and computer-assisted intervention,
pp. 234–241, Springer, 2015.
[12] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time
object detection with region proposal networks,” in Advances in neural
information processing systems, pp. 91–99, 2015.
[13] X. Zhou, D. Wang, and P. Krahenbuhl, “Objects as points,” arXiv
preprint arXiv:1904.07850v2, 2019.