New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug CORE-4089: Onedrive partitioning fails - datetime formatting error #2638
Conversation
Can we add a comment to the file such as so that when we search for a test for this function it comes up? Also it'll guarantee that if a change is made to the test in the future, the author will be aware that their change might affect test coverage Edit: As a note, ideally we should decouple the function's unit test from Sharepoint since we decouple the function from the connector in this PR |
caller_name=integration_name, | ||
caller_version=integration_version, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are Astra changes relevant to our PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they got in unexpectedly, let's remove; if intentional, let's add explanations in the PR description and changelog
Also where have we spotted the Onedrive dates breaking (being inconsistent and failing ingest tests, or something else), or is this PR preventive in case they'll break in the future? |
I just made new tests that cover ensure_isoformat_datetime |
Improved the PR overview. We are seeing this issue live when processing docs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Unstructured-IO#2638) Fixes Onedrive bug the same way Ryan fixed the Sharepoint error. (both are microsoft products) Unstructured-IO#2591 https://github.com/Unstructured-IO/unstructured/pull/2592/files We are seeing occurrences of inconsistency in the timestamps returned by Onedrive when fetching created and modified dates. Furthermore, in future versions of this library, a datetime object will be returned rather than a string. Changes This adds logic to guarantee Onedrive dates will be properly formatted as ISO, regardless of the format provided by the onedrive library. Bumps timestamp format output to include timezone offset (as we do with others) Adds unit tests for isofomat. json_to_dict already unit tested here: https://github.com/Unstructured-IO/unstructured/blob/main/test_unstructured_ingest/unit/test_utils.py Adds small change for AstraDB to allow them to see what source called their api
Unstructured-IO#2638) Fixes Onedrive bug the same way Ryan fixed the Sharepoint error. (both are microsoft products) Unstructured-IO#2591 https://github.com/Unstructured-IO/unstructured/pull/2592/files We are seeing occurrences of inconsistency in the timestamps returned by Onedrive when fetching created and modified dates. Furthermore, in future versions of this library, a datetime object will be returned rather than a string. Changes This adds logic to guarantee Onedrive dates will be properly formatted as ISO, regardless of the format provided by the onedrive library. Bumps timestamp format output to include timezone offset (as we do with others) Adds unit tests for isofomat. json_to_dict already unit tested here: https://github.com/Unstructured-IO/unstructured/blob/main/test_unstructured_ingest/unit/test_utils.py Adds small change for AstraDB to allow them to see what source called their api
Fixes Onedrive bug the same way Ryan fixed the Sharepoint error. (both are microsoft products)
#2591
https://github.com/Unstructured-IO/unstructured/pull/2592/files
We are seeing occurrences of inconsistency in the timestamps returned by Onedrive when fetching created and modified dates. Furthermore, in future versions of this library, a datetime object will be returned rather than a string.
Changes
This adds logic to guarantee Onedrive dates will be properly formatted as ISO, regardless of the format provided by the onedrive library.
Bumps timestamp format output to include timezone offset (as we do with others)
Adds unit tests for isofomat.
json_to_dict already unit tested here:
https://github.com/Unstructured-IO/unstructured/blob/main/test_unstructured_ingest/unit/test_utils.py
Adds small change for AstraDB to allow them to see what source called their api