Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix crash during normalization of time zones #1407

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

G-D-Petrov
Copy link
Collaborator

Reference Issues/PRs

Fixes #1406

What does this implement or fix?

Converts time zone data to string when normalizing with MsgPack.
Adds tests to make sure that the crash doesn't reproduce in the future.

Any other comments?

Checklist

Checklist for code changes...
  • Have you updated the relevant docstrings, documentation and copyright notice?
  • Is this contribution tested against all ArcticDB's features?
  • Do all exceptions introduced raise appropriate error messages?
  • Are API changes highlighted in the PR description?
  • Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

@G-D-Petrov G-D-Petrov force-pushed the 1406-arcticdb-crashes-when-when-trying-to-normalize-timezone-data-with-msgpack branch from 077cc8e to 2b8dd29 Compare March 15, 2024 08:15
@@ -1040,7 +1054,12 @@ def _custom_pack(self, obj):
def _ext_hook(self, code, data):
if code == MsgPackSerialization.PD_TIMESTAMP:
data = MsgPackNormalizer._nested_msgpack_unpackb(data)
return pd.Timestamp(data[0], tz=data[1]) if data[1] is not None else pd.Timestamp(data[0])
val, tz, offset = data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be wise to keep using the numeric indexing so that we can continue to extend if needed.

Also this needs to be able to read old data which only had two fields. So we should add a test that covers the legacy case as well.

if len(data) == 2:
  val, tz = data
  # old method...
else:
  val, tz, offset = data[:3]
  # new methods...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed I had forgotten to push these changes.
As for testing the legacy case, we can do it only with the compatibility tests and I have added the needed changes here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the compatibility test change result in a test that exercises the if len(data)==2: block?

poodlewars
poodlewars previously approved these changes Mar 18, 2024
if tz:
# We need to convert the string tz to a timedelta to denormalize to a datetime.timezone
obj = pd.Timestamp(val, tz=tz)
offset = obj.tzinfo.utcoffset(obj).total_seconds()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests for this case, both asserting zero and non-zero offset please.

@G-D-Petrov G-D-Petrov force-pushed the 1406-arcticdb-crashes-when-when-trying-to-normalize-timezone-data-with-msgpack branch from 65e2877 to 09c439a Compare March 22, 2024 07:56
Add another chak for the tests

Address PR comments and add more test cases

Fix edge case on Windows

Fix broken test on windows

Recreate the timestamp with timezone

Coerce all timezone to datetime.timezone

Add metadata to compat tests

Add meta to compat tests

Add a check for the buff len from denormalizing timestamps

Add a note in the docs about using the time zone info to create other time zone types

Add comment about the old PD_TIMESTAMP normalization

Handle case when we have a valid tz from a past version

Fix tz in meta in backwards compat test

Fix install of old protobuf version

Remove outdated assert

Add 0 offset case

Add write of meta data
@G-D-Petrov G-D-Petrov force-pushed the 1406-arcticdb-crashes-when-when-trying-to-normalize-timezone-data-with-msgpack branch from b26cd53 to 6b24588 Compare March 27, 2024 08:23
@jamesmunro
Copy link
Collaborator

This PR does change the stored and returned timezone data. We should likely hold off for a major version, just as with #1410
Is the original "Coordinated Universal Time" windows example issue urgent in any way?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ArcticDB crashes when when trying to normalize timezone data with MsgPack
3 participants