Abstract. In this paper we evaluate and improve dierent vector implementation
techniques of AES-based designs. We analyze how well
the T-table, bitsliced and bytesliced implementation techniques apply
to the SHA-3 nalist Grstl. We present a number of new Grstl implementations
that improve upon many previous results. For example,
our fastest ARM NEON implementation of Grstl is 40% faster than
the previously fastest ARM implementation. We present the rst Intel
AVX2 implementations of Grstl, which require 40% less instructions
than previous implementations. Furthermore, we present ARM CortexM0
implementations of Grstl that improve the speed by 55% or the
memory requirements by 15%.