CNNs) and addition/multiplication/concatenation fusion. InTable 4, we compare different fusion methods for differentlocations in the network. We find that fusing RGB information is very important “when computing flow directly fromRGB input”. However, it is not as beneficial when computing the flow of representations as the CNN has alreadyabstracted much appearance information away. We foundthat concatenation of the RGB and flow features performpoorly compared to the others. We do not use two-streamfusion in any other experiments, as we found that computingthe representation flow after the 3rd residual block providessufficient performance even without any fusion.