We're starting to build up some feel for the softmax funetion and the way softmax layers behave. Just to review where we're at: the exponentials in Equation (78) ensure that all the output activations are positive. And the sum in the denominator of Equation (78) ensures that the softmax outputs sum to 1. So that partieular form no longer appears so mysterious: rather, it isa natural wayto ensure that the output activations form a probability distribution. You can think of softmax as a way of rescaling the df, and then squishing them together to form a probability distribution.