Another possible solution to mitigate the degradation problem inMA is to warm-up per split optimization by introducing history up- date information to SGD. However, its gain is limited in comparison with BMUF, therefore we will not report its result here.