The f˜(xi) is the top conv feature of an occluded face after masking, F is the fc layer(s) of the trunk CNN model next to the top conv layer, and it could also be the average pooling layer in models like [13].The differential signal and pairwise loss fdiff. The results shown in Figure 3 inspired us that the differential signal between the top conv activation values of an occluded face and its corresponding clean one could be a good indica- tor of which feature elements are potential corrupted ones. To put it another way, the differential input signal acts as a role of attention mechanism which encourages the mask generator to focus on those feature elements that have devi- ated from its true values owing to partial occlusion. There- fore we feed our mask generator module with the absolute difference between features of an occlusion-free face and its occluded counterpart.