SkyEyeSystem

A web-wide hotspot crawler project based on Spring Boot
中文 · English

AboutTheProject

Every day at 3 p.m., regularly crawl the hot search data of the whole network. include

Weibo hot search
Station B hot search
CSDN Hot Search
Zhihu Hot Search
Today's headlines
Baidu Hot Search
jueJing
36k
QQNew
ShaoShuPai

After crawling the data

The raw data will be stored in MySQL.
Conduct word frequency statistics and deposit them in Redis.

QuickStart

Here's how to quickly use the project

Prerequisites

Make sure your installer is Maven

Installation

maven sync
Execute SQL script [SQL script] (src/main/resources/db/ddl.sql)
Configure your database address in application
db Configure the redis address in config
Just start.

Use

1. Perform crawler operations manually

execute HotSpotCrawlerTest.java

2. Configure the execution time of the crawler

Modify the annotation value in crawl task Receives standard CRON parameters. It can be generated online using Cron Online Expression Builder.

public class CrawlerTask {
    
    @Scheduled(cron = "0 */10 9-23 * * *")
    public void crawl() {
        // ...
    }

}

3. added platform implementation of crawler data

First, add a record of the response platform in the Platform Table hot_platform. Examples are as follows.

INSERT INTO sky_eye_system.hot_platform 
VALUES (2, 
        '微博',
        'https://ts3.cn.mm.bing.net/th?id=ODLS.05d45f55-2151-4d66-83e5-d10018607094&w=32&h=32&qlt=90&pcl=fffffa&o=6&pid=1.2',
        '随时随地发现新鲜事！微博带你欣赏世界上每一个精彩瞬间，了解每一个幕后故事。分享你想表达的，让全世界都能听到你的心声！',
        'https://weibo.com', 
        '随时随地发现新鲜事！', 
        '王志东', 
        null, 
        null, 
        0);

Add the corresponding platform class under [src/main/java/cn/shoxiongdu/SkyEyeSystem/task/hotspot/crawl/impl] and implement the interface HotDataCrawler

public class XXXCrawler implements HotDataCrawler {
    
   // the id in the platform table
    private static final Long PLATFORM_ID = ${platformId};
    
    private PlatformMapper platformMapper;
    
    @Override
    public List<HotSpot> crawlHotSpotData() {
        // Execute custom crawler logic The returned HotSpot list.
        return hotSpotList;
    }
    
    @Override
    public Platform getPlatform() {
        return platformMapper.selectById(PLATFORM_ID);
    }
}

Implement the crawlHotSpotData method, execute custom data crawling logic, and encapsulate the crawled data as a HotSpot List and return.
Change the value of the constant PLATFORM_ID to the id in your corresponding platform table.
Add the implementation class to the Spring container. ( @Component/@Service )
Done. At this point, the scheduled task executes your crawling logic and puts it into storage. At the same time, the corresponding data will be displayed on the home page.

Contribute

Contributions make the open source community a great place to learn, inspire, and create. Thank you very much for any contribution.

Fork project
Create feature branches
Commit the changes
Push to branch
Open a pull request

License

DISTRIBUTION UNDER THE MITT LICENSE, PLEASE FOLLOW THE RELEVANT OPEN SOURCE LICENSE: [MIT] (LICENSE)

Contact

ShaoxiongDu email@shaoxiongdu.cn
WeChat: 15603430511
Personal blog: https://shaoxiongdu.cn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_en.md

README_en.md

SkyEyeSystem

AboutTheProject

QuickStart

Prerequisites

Installation

Use

1. Perform crawler operations manually

2. Configure the execution time of the crawler

3. added platform implementation of crawler data

Contribute

License

Contact

Files

README_en.md

Latest commit

History

README_en.md

File metadata and controls

SkyEyeSystem

AboutTheProject

QuickStart

Prerequisites

Installation

Use

1. Perform crawler operations manually

2. Configure the execution time of the crawler

3. added platform implementation of crawler data

Contribute

License

Contact