A web-wide hotspot crawler project based on Spring Boot
中文
·
English
Every day at 3 p.m., regularly crawl the hot search data of the whole network. include
- Weibo hot search
- Station B hot search
- CSDN Hot Search
- Zhihu Hot Search
- Today's headlines
- Baidu Hot Search
- jueJing
- 36k
- QQNew
- ShaoShuPai
After crawling the data
- The raw data will be stored in MySQL.
- Conduct word frequency statistics and deposit them in Redis.
Here's how to quickly use the project
Make sure your installer is Maven
-
maven sync
-
Execute SQL script [SQL script] (src/main/resources/db/ddl.sql)
-
Configure your database address in application
-
db Configure the redis address in config
-
Just start.
execute HotSpotCrawlerTest.java
Modify the annotation value in crawl task Receives standard CRON parameters. It can be generated online using Cron Online Expression Builder.
public class CrawlerTask {
@Scheduled(cron = "0 */10 9-23 * * *")
public void crawl() {
// ...
}
}
-
First, add a record of the response platform in the Platform Table hot_platform. Examples are as follows.
INSERT INTO sky_eye_system.hot_platform VALUES (2, '微博', 'https://ts3.cn.mm.bing.net/th?id=ODLS.05d45f55-2151-4d66-83e5-d10018607094&w=32&h=32&qlt=90&pcl=fffffa&o=6&pid=1.2', '随时随地发现新鲜事!微博带你欣赏世界上每一个精彩瞬间,了解每一个幕后故事。分享你想表达的,让全世界都能听到你的心声!', 'https://weibo.com', '随时随地发现新鲜事!', '王志东', null, null, 0);
-
Add the corresponding platform class under [src/main/java/cn/shoxiongdu/SkyEyeSystem/task/hotspot/crawl/impl] and implement the interface HotDataCrawler
public class XXXCrawler implements HotDataCrawler { // the id in the platform table private static final Long PLATFORM_ID = ${platformId}; private PlatformMapper platformMapper; @Override public List<HotSpot> crawlHotSpotData() { // Execute custom crawler logic The returned HotSpot list. return hotSpotList; } @Override public Platform getPlatform() { return platformMapper.selectById(PLATFORM_ID); } }
-
Implement the crawlHotSpotData method, execute custom data crawling logic, and encapsulate the crawled data as a HotSpot List and return.
-
Change the value of the constant PLATFORM_ID to the id in your corresponding platform table.
-
Add the implementation class to the Spring container. ( @Component/@Service )
-
Done. At this point, the scheduled task executes your crawling logic and puts it into storage. At the same time, the corresponding data will be displayed on the home page.
Contributions make the open source community a great place to learn, inspire, and create. Thank you very much for any contribution.
-
Fork project
-
Create feature branches
-
Commit the changes
-
Push to branch
-
Open a pull request
DISTRIBUTION UNDER THE MITT LICENSE, PLEASE FOLLOW THE RELEVANT OPEN SOURCE LICENSE: [MIT] (LICENSE)
- ShaoxiongDu email@shaoxiongdu.cn
- WeChat: 15603430511
- Personal blog: https://shaoxiongdu.cn