To handle the network structure we have to operate a pair of images with 6 channels (a pair of RGB images). Therefore, if original pair consists of a grayscale and a color images we convert RGB image into grayscale and then treat these grayscale pair as a color one. The ratio of grayscale and color images is 5% per object landmark. As proposed deep network was pretrained on the images in a normal orientation, we automatically rotate images in our training dataset to the normal orientation using EXIF information. Finally, we resize images to 227 227 pixels size without cropping and put it as input for a neural network that we propose.