Traditional image retrieval systems often have difficulty in meeting the needs of users in retrieving relevant content. Firstly, in terms of natural language processing, the analysis of user search terms failed to reach the semantic level. In addition to the weight adjustment between the word segmentation and the words, the overall search results are relatively poor, especially for some combinations of phrases that are not commonly used. Secondly, in terms of resource selection, the results expected by some users are not included in the database at all, so it is absolutely impossible to show them to users. This is because data mining technology is not perfect. It is impossible to accurately mine the graphic and text relationship information, especially in the aspect of the main image meaning recognition, it is easy to mine the non-main map in the news and blog and identify it as the main image. Thirdly, there are timeliness issues showing the results. Some user searches are time-sensitive, that is, it is expected to retrieve breaking news information. If the image retrieval mechanism does not take into account the timeliness, it will not be able to meet the user's timeliness. Because the whole system needs to establish a series of time-sensitive resource display mechanisms from warehousing, display, empowerment, ordering, and exit. Finally, traditional image retrieval systems are scarce in visual features and learning applications, resulting in frequent occurrences of various noises, and user’s experience is rather unsatisfactory.