When the input value of the activation function layer is the maximum or minimum value, it is easy to see the gradient disappear. ReLU activation function can maintain the original gradient when the input value is greater than 0, alleviate the non-gradient disappearing line image, is currently the most mainstream activation function, but when its input value is less than 0, will be suppressed, there is a negative gradient disappearing phenomenon.
正在翻译中..