Scheduler tasks can hang on dropped HTTP connection #91

icook · 2014-08-27T21:49:15Z

Tasks that make remote requests can hang indefinitely if a socket connection is silently dropped. Since APScheduler will not run two instances of the same task at once then the situation never resolves itself without a restart of the scheduler. Simple solutions is to set:

socket._GLOBAL_DEFAULT_TIMEOUT = 60

to cause all socket connections to eventually timeout. This is likely the cause of Celery hanging on the simplecrypto/pool_list as well.

Manifested by Worker Count going to 0 on SimpleVert.com when a connection to a geo stratum was dropped.

The text was updated successfully, but these errors were encountered:

ericecook · 2014-08-31T20:18:15Z

Currently 0 worker count on SV. Scheduler restart appears to not have solved the issue

icook · 2014-09-01T17:32:39Z

Hmm, how long did you wait after the restart? I believe the worker count cache code only runs every 10 minutes. If this is the case then perhaps I misdiagnosed....

ericecook · 2014-09-01T18:13:38Z

So to clarify - I'm not 100% sure how long I waited - I'd guestimate ~15 mins.

Its quite possible that whatever is causing the issue happened soonish after I restarted, although I did make several attempts.

I have several times restarted the scheduler and have it fix the bug.

The bug is also occuring (pretty frequently) on SimpleDoge atm as well - so not related to Geo.

icook · 2014-09-01T21:44:59Z

Good to know, I'll look into it more once multi is out.

ericecook · 2014-09-01T22:09:28Z

It actually may be a bug in the new code. I'm not sure exactly when it first appeared - but its only been on Doge + Vert, and only just recently.

icook · 2014-09-01T23:36:58Z

We haven't deployed code to them in over 3 weeks, so it seems unlikely to be a code change. OVH's network definitely has seemed more flaky lately, so I'm betting it's related to that.

ericecook · 2014-09-01T23:50:16Z

I was thinking it could be an issue introduced by the new Powerpool code.

Possibly its setup to handle http requests differently or something? idk, it doesn't seem terribly likely - but the timing is suspicious

icook · 2014-09-02T00:32:47Z

We're not running new powerpools on Doge yet, so probably not.

ericecook · 2014-09-02T00:57:42Z

Yea thats right, doh. Well its not a network issue - Doge is having the problem, and it doesn't use Geos.

icook added the bug label Aug 27, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduler tasks can hang on dropped HTTP connection #91

Scheduler tasks can hang on dropped HTTP connection #91

icook commented Aug 27, 2014

ericecook commented Aug 31, 2014

icook commented Sep 1, 2014

ericecook commented Sep 1, 2014

icook commented Sep 1, 2014

ericecook commented Sep 1, 2014

icook commented Sep 1, 2014

ericecook commented Sep 1, 2014

icook commented Sep 2, 2014

ericecook commented Sep 2, 2014

Scheduler tasks can hang on dropped HTTP connection #91

Scheduler tasks can hang on dropped HTTP connection #91

Comments

icook commented Aug 27, 2014

ericecook commented Aug 31, 2014

icook commented Sep 1, 2014

ericecook commented Sep 1, 2014

icook commented Sep 1, 2014

ericecook commented Sep 1, 2014

icook commented Sep 1, 2014

ericecook commented Sep 1, 2014

icook commented Sep 2, 2014

ericecook commented Sep 2, 2014