Comparison to other motion representations We compare to existing CNN-based motion representation methodsto confirm the usefulness of our representation flow. Forthese experiments, when available, we used code providedby the authors and otherwise implemented the methods ourselves. To better compare to existing works, we used (16×)224 × 224 images. Table 8 shows the results. MFNet [15]captures motion by spatially shifting CNN feature maps,then summing the results, TVNet [5] applies a convolutionaloptical flow method to RGB inputs, and ActionFlowNet