Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

有些bug,我调试了一下,没找到原因 #2

Open
ohyeah521 opened this issue Sep 28, 2017 · 1 comment
Open

有些bug,我调试了一下,没找到原因 #2

ohyeah521 opened this issue Sep 28, 2017 · 1 comment

Comments

@ohyeah521
Copy link

Exception in thread Thread-3:
Traceback (most recent call last):
File "C:\Python27\lib\threading.py", line 801, in __bootstrap_inner
self.run()
File "D:\tumblr_spider-master\tumblr_spider-master\tumblr.py", line 61, in run
self.download(url)
File "D:\tumblr_spider-master\tumblr_spider-master\tumblr.py", line 26, in download
res = requests.get(url)
File "C:\Python27\lib\site-packages\requests\api.py", line 70, in get
return request('get', url, params=params, **kwargs)
File "C:\Python27\lib\site-packages\requests\api.py", line 56, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 475, in request
resp = self.send(prep, **send_kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 596, in send
r = adapter.send(request, **kwargs)
File "C:\Python27\lib\site-packages\requests\adapters.py", line 487, in send
raise ConnectionError(e, request=request)
ConnectionError: HTTPSConnectionPool(host='the-dirty-mind.tumblr.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSCon
nection object at 0x065B2ED0>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

出现折中情况后,程序就卡死在这里,不再继续执行了。

@ohyeah521
Copy link
Author

貌似解决了,解决方法如下:

def download(self, url):
    res=None
    try:    # 添加异常过滤
        res = requests.get(url, verify=False, timeout=10)

    except Exception,e:
        print "download except:"+e
        return

==========================================
url = 'https://%s.tumblr.com/' % user # 这里变为 https

NUM_WORKERS = 36 # 增大线程数量,否则队列数量过小,很容意就把队列耗尽,不能持续产出。

修改此3处后,可以连续工作,不会因为异常而中断,阻塞。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant