Inspired by the optical flow algorithm, we design a fullydifferentiable, learnable, convolutional representation flowlayer by extending the general algorithm outlined above.The main differences are that (i) we allow the layer to capture flow of any CNN feature map, and that (ii) we learnits parameters including θ, λ, and τ as well as the divergence weights. We also make several key changes to reducecomputation time: (1) we only use a single scale, (2) wedo not perform any warping, and (3) we compute the flowon a CNN tensor with a smaller spatial size. Multiple scaleand warping are computationally expensive, each requiringmany iterations. By learning the flow parameters, we caneliminate the need for these additional steps. Our method isapplied on lower resolution CNN feature maps, instead ofthe RGB input, and is trained in an end-to-end fashion. Thisnot only benefits its speed, but also allows the model to learna motion representation optimized for activity recognition.We note that the brightness consistency assumption cansimilarly be applied to CNN feature maps. Instead of capturing pixel brightness, we capture feature value consistency.