Activation function plays a key role in neural networks. Neural networks lacking activation functions are simply linear weights and cannot represent complex features. Therefore, activation function layers need to be added to increase network nonlinearity during the network construction process. The activation function must be differentiable while satisfying the nonlinearity, so as to ensure that the network can use the gradient optimization algorithm for derivation. Figure 6 shows three typical activation functions, including Sigmoid, Tanh and ReLU. It can be seen from the figure that the two ends of the Sigmoid and Tanh function images are very smooth, and the gradient is close to 0. When the input value of the activation function layer is a maximum or a minimum, the gradient disappears. ReLU activation function can maintain the original gradient when the input value is greater than 0, alleviating the non-gradient disappearing line image, is currently the most mainstream activation function, but because its input value is less than 0, it will be suppressed, and there is still the phenomenon of negative gradient disappearing .