The fast R-CNN was developed [38] to address the limitations of R-CNN. As shown in Fig. 3, the convolutional neural network is applied to the input image to produce the feature map, on which the region proposals are generated by selective search. Through a region of interest(RoI) pooling layer, the generated region proposals are converted into fixed-length vectors, which are fed into the fully connected layers for both classification and bounding box regression. Compared with R-CNN, fast R-CNN performs training using multi-task loss in a single-stage manner, which greatly reduces the training time. The detection isfaster because the convolution process is only conducted once for the original image instead of for all the region proposals, and the detection accuracy is improved.