In recent years, with the development of the Internet, effective access to Internet information has become a new research field of numerous Internet companies. In an era of rich data, those who can get more useful data can get more profits. Web crawler is the most commonly used method to get data on the Internet.<br>Web crawler is also known as web robot and web spider. It is a program that can automatically collect information on the Internet according to a given URL according to certain rules. In this paper, a web crawler system based on Python is implemented. In this paper, we basically discussed some main problems in the process of implementing crawler, such as how to use Python to simulate login, how to get information through regular expression to match strings, and how to use Mysql to store data.<br>Through this crawler, you can easily collect some data information of Douban website, such as: dynamic information of website homepage, movie details, movie reviews, etc.<br>Keywords: Python; regular expression; MySQL; web crawler
正在翻译中..