Abstract:Taking the earthquake catalogs from the Internet information of earthquakes between 2010 to 2019 in mainland China as an example, we propose an information acquisition technique based on Baidu search engine, and generate a set of URL generation rules with "time, place name, magnitude" as keywords. The first 100 sites retrieved by Baidu by using this technique are used to build a basic corpus of earthquake information and to form a method for acquiring Internet disaster information on earthquakes. The existing deactivation thesaurus is used to eliminate useless information, and then to conduct preliminary cleaning of the crawled information. The further digging into the implied information is performed in order to explore disaster correlations, and to establish a basis for rapid acquisition of Internet disaster information after earthquakes.