Rather than explainable solutions [35, 50, 40, 29, 37] to certain white-box models via calculating importance based on the information like the network’s weights and gradients. We advocate a more general explainable approach to produce a saliency map for an arbitrary network as a black-box model, without requiring its details about the architecture and implementation. Such a saliency map can show how important each image pixel is for the networks prediction.Recently, multiple explainable approaches have been proposed for black-box models. LIME [26, 1] proposes to draw random samples around the instance for an explanation by fitting an approximate linear decision model. However, such a superpixel based saliency method may not group correct regions. RISE [27] explores the black-box model by sub sampling the input image via random masks and generating the final importance map by a linear combination of the random binary masks. Although this is seemingly simple yet surprisingly powerful approach for blackbox models, the results are still far from perfect, especially in complex scenes.