So, for instance, if we're training with MNIST images, and input an image of a 7, then the log-likelihood cost is - In d,. To see that this makes intuitive sense, consider the case when the network is doing a good job, that is, it is confident the input is a 7. In that case it will estimate avalue for the corresponding probability d which is close to l, and so the cost - In d; will be small. By contrast, when the network isn't doing such a good job, the probability d; will be smaller, and the cost - In ai will be larger. So the log-likelihood cost behaves as we'd expect a cost function to behave.