Where to compute flow? To determine where in the network to compute the flow, we compare applying our flowlayer on the RGB input, after the first conv. layer, and afterthe each of the 5 residual blocks. The results are shown in Table 1. We find that computing the flow on the input providespoor performance, similar to the performance of the flowonly networks, but there is a significant jump after even 1layer, suggesting that computing the flow of a feature is beneficial, capturing both the appearance and motion information.However, after 4 layers, the performance begins to decline asthe spatial information is too abstracted/compressed (due topooling and large spatial receptive field size), and sequentialfeatures become very similar, containing less motion information. Note that our HMDB performance in this table isquite low compared to state-of-the-art methods due to beingtrained from scratch using few frames and low spatial resolution (112 × 112). For the following experiments, unlessotherwise noted, we apply the layer after the 3rd residualblock. In Fig. 7, we visualize the learned motion representations computer after block 3.