Each picture channel in each row represents the feature value of the picture frame obtained at different time points. For the feature value of the same channel at different time points, the feature values of the same channel are shifted along the time dimension, and some of the channel values are shifted down by one grid, and some of the channel values Shift up by one grid, fill in the vacant part with 0 after shifting, and move out the channel value of the extra feature map, so as to achieve two-way translation, and the feature information of adjacent frames is mixed with the current frame after the movement.
正在翻译中..