In order to enhance the semantic fidelity of the generated image, a semantic contrast module is constructed. The experimental results show that our method is superior to the previous image generation methods. When constructing the semantic comparison module, we considered that a) reconstruction of text information would cause additional losses and affect the generation efficiency. Therefore, the module only maps the features of text and image to the same semantic space and improves the semantic consistency of the generated image by reducing the feature differences between text and image. B) When calculating the text-image loss, firstly classify and refine the features of text and image, then calculate the similarity of text image, and reduce the difference between text and image by adopting consistency antagonism and classification loss. In addition, the control of word information is added on the basis of channel attention, and a hybrid attention mechanism is designed. The hybrid attention mechanism extracts the most relevant words in the text and guides the generator to pay more attention to the details of the image.