Furthermore, also the positive pairs with largest distances seem to have incorrect labels. Thus, we infer that the original algorithm [2] (based on bag-of-visual-words) for computing ground truth labels is not perfect and could be a likely reason for many misclassified test pairs and for the slump in the beginning of PR curves. However, despite the errors in the labels of training and test data, we may conclude that the network has improved in distinguishing between similar and dissimilar pairs, as it is still realistic to assume that most of the positive/negative labels provided by [2] are correct.