-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential fix for the celery beat scheduler issue when not running in UTC #4324
Conversation
Granted my PR is commenting out a real old line of code, and yet all the unit tests seemed to pass and it did solve this issue in my testing. Looking from collaborator feedback. |
That's quite a scary change you came up with 😃 |
@thedrow I suspect an alternative change might be to check if the dt passed to localize already has tzinfo, and if it does not already have it to still call .astimezone(tz) on it. It seems there are some usages of localize outside the scope of what I was testing, but it also seems there are lacking unit tests around these date time expectations, because as I am able to show, this line of code from 2012 breaks no tests and apparently hinders the BaseSchedule when it calls to_local() without being in UTC time, hence this bug. |
The real fix should be to use pytz for localizing timezones since it should handle it correctly. |
Are you suggesting that the pytz should be called directly and replace this localize(dt, tz) method? It seems this is a compat method that tries to support multiple versions of pytz. For example, its calling tz.normalize which is present in pytz 2012f however in the latest pytz 2017.2 this always raises AttributeError. I'd be happy to add some test coverage as soon as I finish rationalizing what the correct code change to resolve this bug actually is. I think my strategy would be to test the scheduler(s) which would add tests these methods implicitly, though let me know if you have more ideas of what test cases we should be adding are. |
We should definitely drop support for old pytz versions. We can change our requirements accordingly. |
I am spending some time reading up on pytz tonight, but after an initial audit of projects I know about there are a wide range of pinned versions of pytz out there. If we can require the latest and greatest to release whatever this refactor will amount to great, but I also just read about an alternative to pytz called pendulum that seems fairly promising in general, curious your thoughts on pytz over pendulum, this is an interesting read: https://pendulum.eustace.io/blog/a-faster-alternative-to-pyz.html |
So assuming that we stick with pytz, I am seeing a lot of inconsistency between the way that pytz talks about how it "only supports two ways of building a localized time" and "the preferred way of dealing with times is to always work in UTC, converting to localtime only when generating output to be read by humans." Anyway, it seems like all of the time code in celery deserves some kind of review. I just searched open issues for time and I while this PR doesn't fix the issue (its already i trunk) it does indicate the time methods in celery have had a lot of disparate changes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a small flake8 error.
Other than that, is this PR ready for merge?
Does it fix the issue?
I am aware of the error and no @thedrow its not ready for merge. I have some local changesets to this, started adding some unit tests, but the problem is the two I added pass with or without my change and the third one fails both before and after this change. I'll let you know when I think I am complete. |
@thedrow Well I added some unit tests that I thought would fail on the old code but alas they do not. I know for sure the bug is fixed with this code change however, it had something to do with converting from the already local time (Ex: 2017-10-11 22:40:24.316813-04:00) calling astimezone would yield an incorrect (Ex: 2017-10-12 22:40:24.316813+20:00) which is the same timestamp with a different zone applied, thus incorrect when it gets to is_due the tasks are always seen as due. I think to really show a unit test of the bug that was fixed, we would have to somehow understand and add tests of the is_due time remaining logic. I'm not sure how much time I have to dedicate to that pursuit right now (no pun intended) however I am happy that this PR passes the CI, adds new tests of the method I changed and actually fixes the bug outlined in the issue tracker. Mostly Done? I'm not sure why I got a flake 8 failure this time ... fixed the prior line length issue. The log seems to contain no real clue on the failure this time and I'm not overly familiar with tox running it locally: https://travis-ci.org/celery/celery/jobs/291876905 |
@@ -300,7 +300,10 @@ def make_aware(dt, tz): | |||
|
|||
def localize(dt, tz): | |||
"""Convert aware :class:`~datetime.datetime` to another timezone.""" | |||
dt = dt.astimezone(tz) | |||
if is_naive(dt): # Ensure timezone aware datetime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this code section is what maybe_make_aware
performs, isn't that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@georgepsarakis Trying to get this cleaned up for possible merge for 4.2. I agree that it is a similar check to maybe_make_aware but the bug here is that you can't safely call datetime.astimezone on all timezones. It doesn't make sense to call maybe_make_aware
here because that just in turn calls localize
which would create an infinite recursive loop. Are you suggesting I actually make a different change here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. My comment was mostly on finding out if the logic is duplicated, or a refactoring would be a better approach.
@matteius do not spend time on these flake8 errors, I think the latest version now detects more possible issues in the code. We will fix them in a separate PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matteius could you please re base and complete the remaining work so that this fix could go for 4.2 release?
@auvipy Just spent the last 4 hours trying to do/understand rebase better. After several attempts at pushing out stuff forcfully I think I got it, except who knows if tox will think the imports are in the right order since when I run that locally I get weird results. Strangely, my latest build was canceled and Geoffrey Bauduin started a new build of my PR yet it doesn't show on the output of this page. I am very confused and would like some help closing to bow on this one. |
Very confusing ... I clearly have an lsort issue but not sure the best way to resolve those other than try random orderings which creates more commits for me to rebase. Ahhhh! Also, a lot of PRs are failing with the TOXENV=bandit and the latest builds to succeed don't even list having run TOXENV=bandit ... how is this possible? |
Hi @matteius, thanks for sticking with this. Rebase can certainly be tricky if it's not part of your normal workflow. I've just fixed the problem with bandit, so PRs should build cleanly now on Travis. Not sure about the isort problems - you should be able to see the specific errors in the build log if you push changes. |
make astimezone call in localize more safe; with tests
Thanks @AlexHill at this point I think the PR is in a final state of readiness. Please let me know if there is anything else I can do to shore it up. |
Code review commented were addressed.
make astimezone call in localize more safe; with tests
I first encountered the traced the issue of #4184 in a project space I worked on but the work around was to change to UTC. I was playing with adding logging tonight and used these two django settings to convert between the working and non working version (adding these makes it non-working):
CELERY_ENABLE_UTC = False
CELERY_TIMEZONE = 'America/New_York'
So it seems the issue is that when localize is called by to_local_fallback that it is already time zone aware. In fact it was always aware up the stack, but when it gets converted using astimezone, it becomes a date in the future and so it makes celery beat think that the task is always past due. My print output may not make the most sense, but I annotated a few methods of the same name and you can see in my example the exact moment the datetime went from a -04:00 offset to +20:00. Since this is a common method and this line of code hasn't changed since 2012, I am not sure the exact direction to take fixing the bug, but I do want to fix it. I have much time invested at this point.