Before entering into the feedforward layer, the force matrix needs to be merged. After calculating the dot product for each head, the output matrices are merged and the newly added weight matrices are introduced for calculation. The output matrix will contain all the information in all the attention heads.