Global features of the vehicle through the four input image convolution layer extracted backbone network, the acquired three-dimensional tensor denoted as T, having a size of [m, n].
The input image extracts the vehicle's global characteristics through four convolution layers of the backbone network, and the acquired three-dimensional volume is recorded as T, which is the size of the .m, n.
The global feature of the vehicle is extracted from the input image through four convolutions of the backbone network, and the acquired three-dimensional tensor is recorded as t, whose size is [M, n].<br>