The theme crawler needs to filter the theme when grabbing the actual web pages. It tries to ensure that only information relevant to the topic is captured. In the process of processing the whole page need to add some modules: theme building module, optimization of the initial page information module, topic relevance analysis, sorting module. The theme building module is used to determine the theme content of the crawler, which is the basic content of the crawler's work. Topic relevancy analysis can be used to calculate the web page topic relevancy, is the most important part of the topic crawler, used to determine the page to stay; The initial webpage information module is used to select a better seed site in a specific topic, so that the crawler module can carry out crawler work smoothly. It is an auxiliary module and will not participate in the processing of data flow.