to compute image gradients, the kernels for the divergencecomputation and even τ, λ, θ. In Table 2, we compare theeffects of learning different parameters. We find that learningthe Sobel kernel values reduces performance due to noisygradients particularly when the batch size is limited, butlearning the divergence and τ, λ, θ is beneficial.