Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

抽取百科别的内容 #24

Open
Garethyu opened this issue Apr 28, 2022 · 3 comments
Open

抽取百科别的内容 #24

Garethyu opened this issue Apr 28, 2022 · 3 comments

Comments

@Garethyu
Copy link

您好,我成功运行了您的项目,这是一个非常棒的项目。

不过爬取的大多数内容都是人物,请问在什么地方进行修改能够爬取别的内容呢?

@adventurexw
Copy link

我也有相同的问题 修改这一句#start_urls = ['https://baike.baidu.com/item/文汇报'] 好像没有用

@lixiang0
Copy link
Owner

简单的做法,比如你要爬取花,那就以花开始爬,比如https://baike.baidu.com/item/荷花

@adventurexw
Copy link

简单的做法,比如你要爬取花,那就以花开始爬,比如https://baike.baidu.com/item/荷花

我是这样改的,但好像没有用,他好像是顺着上次爬虫的结果,接着往下爬取的。而且这个好像没有那个设置停止,就是不停的往下爬虫的样子。我也很奇怪。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants