Skip to content
This repository has been archived by the owner on Jan 9, 2022. It is now read-only.

添加新的网站支持 #22

Open
TonySue2000 opened this issue Nov 10, 2018 · 9 comments
Open

添加新的网站支持 #22

TonySue2000 opened this issue Nov 10, 2018 · 9 comments

Comments

@TonySue2000
Copy link

请问各位维护者能否添加对MOOC网站"北京高校优质课程研究会"(http://www.livedu.com.cn)的下载支持,

本来想为这个项目写点代码,可惜自己不会Python,只能做这点微小贡献,表示十分惭愧.

@SigureMo
Copy link
Contributor

有空我试下,但最近应该是没时间

@SigureMo
Copy link
Contributor

SigureMo commented Nov 11, 2018

@TonySue2000 已完成,麻烦看下issues23的最新更新内容

@TonySue2000
Copy link
Author

多谢贡献,不过这个功能似乎有待完善.
cookies结构是这样的吗?好像比网易的短不少:name=value; name=value; JSESSIONID=XXXXXXXXXXXX
但回车后有以下错误
Traceback (most recent call last):
File "mooc.py", line 103, in
main()
File "mooc.py", line 96, in main
livedu.start(args.url, config, cookies)
File "/home/tony/下载/course-crawler-master/mooc/livedu.py", line 165, in start
course_info = get_summary(url)
File "/home/tony/下载/course-crawler-master/mooc/livedu.py", line 27, in get_summary
for chapter_lable in home_soup.find('div', class_='vice-main-kcap')
AttributeError: 'NoneType' object has no attribute 'find'
如何解决?@SigureMo

@SigureMo
Copy link
Contributor

SigureMo commented Nov 21, 2018

cookies结构应该是没问题的,最好提供下具体课程的网址,白天我看下是出了什么问题 @TonySue2000

@TonySue2000
Copy link
Author

@SigureMo
Copy link
Contributor

已完成~

问题简述:
livedu整体数据是存储在页面内的,基本没有异步加载过程,所以解析起来很烦但并不难,但标题数据在学习页面只显示一部分(后面居然干脆...了),所以chapter_name我是在课程主页解析的,而lesson_name我是直接在学习页面的具体学习课程内解析的,由于所有数据都是从页面内解析,这就过分的依赖于页面的一致性,刚刚出现的问题就是从主页解析chapter_name时候发生无法解析的错误,我也没想到居然还有两种主页,当然不排除还有第三种,懒得找了,遇到再说
刚刚遇到的页面是:悖论:思维的魔方,而我之前测试使用的两个课程是这样的:人工智能,所幸学习页面都一样,不然就要写两套代码了……

修复链接:
🐛 Fix bug of livedu

@TonySue2000
Copy link
Author

嗯,输了之后又报错???直接Po命令行了
`tony@kali:~/下载/course-crawler-master$ python3 mooc.py http://www.livedu.com.cn/ispace4.0/moocxjkc/toKcView.do?kcid=216
输入 Cookie:

name=value; name=value; JSESSIONID=AE351313B95DE23F83B3
Traceback (most recent call last):
File "mooc.py", line 103, in
main()
File "mooc.py", line 96, in main
livedu.start(args.url, config, cookies)
File "/home/tony/下载/course-crawler-master/mooc/livedu.py", line 171, in start
course_info = get_summary(url)
File "/home/tony/下载/course-crawler-master/mooc/livedu.py", line 23, in get_summary
name = study_soup.find('dl', class_='content-a-title').find('dt').find('span').string
AttributeError: 'NoneType' object has no attribute 'find'
`
话说这和Linux无关的吧,毕竟Python的一大卖点就是跨平台性呢

@SigureMo
Copy link
Contributor

SigureMo commented Nov 22, 2018

Win10、Ubuntu Server 16.04 测试正常,请重新尝试几次 我中午测试时候确实发现有一次在这里报错,但是之后无法复现也不明白是什么问题,猜测是网络连接不稳定引起的

tim 20181122212134

@TonySue2000
Copy link
Author

成功了,多谢大佬

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants