In this work, we explore the setting of selective answering for a classification task. The motivation behind this work is the fact that not all incorrect predictions are incorrect to the same extent and not all correct predictions are correct to the same extent as the model is not equally confident on all its predictions. For example, a dog classifier model gets its prediction wrong on two samples, one is of a cat and other is of a car. Though, the model is incorrect on both the samples, it is clearly more incorrect on car as compared to the cat. Hence, assigning the same annotation label to both samples limits the learning of calibrator. On the other hand, assigning gold annotation based on the degree of correctness provides more flexibility to the calibrator to look for fine-grained features distinguishing various annotation scores. Keeping this in mind, we propose a novel method that shifts away from the categorical labels and directly targets the probability of the model’s prediction being correct. Specifically, we transform the calibrator training from a classification problem to a regression problem where the regression score gives an estimate of the extent to which the models prediction is correct. We propose a number of ways to compute gold scores for training calibrator on the regression task and compare performance of the proposed method with other approaches in the selective answering literature. Table 1 illustrates the difference between the proposed annotation strategy and the annotation strategy used in existing approaches.