`docinfo_from_xmp()` fails on reduced precision dates (`YYYY` / `YYYY-MM`) #576

devnoname120 · 2024-04-07T19:04:53Z

docinfo_from_xmp() returns an exception when given a reduced precision date (YYYY / YYYY-MM).

Investigation

From the documentation of Adobe about the Date XMP data type:

A date-time value is represented using a subset of the formats as defined in Date and Time Formats:
YYYY
YYYY-MM
YYYY-MM-DD
YYYY-MM-DDThh:mmTZD
YYYY-MM-DDThh:mm:ssTZD
YYYY-MM-DDThh:mm:ss.sTZD

However pikepdf uses datetime.fromisoformat(), which according to Python's documentation doesn't support the YYYY and YYYY-MM date formats:

classmethod date .fromisoformat(date_string)
Return a date corresponding to a date_string given in any valid ISO 8601 format, with the following exceptions:

Reduced precision dates are not currently supported (YYYY-MM, YYYY).

How to reproduce

Example 1

import pikepdf

pdf = pikepdf.new()

with pdf.open_metadata() as pdf_metadata:
    pdf_metadata['xmp:CreateDate'] = '2023'

Output:

project/.venv/lib/python3.12/site-packages/pikepdf/models/metadata.py:529: UserWarning: The DocumentInfo field /CreationDate could not be updated from XMP

Example 2

import pikepdf

pdf = pikepdf.new()

with pdf.open_metadata() as pdf_metadata:
    pdf_metadata['xmp:CreateDate'] = '2023-11'

Output:

project/.venv/lib/python3.12/site-packages/pikepdf/models/metadata.py:529: UserWarning: The DocumentInfo field /CreationDate could not be updated from XMP

The text was updated successfully, but these errors were encountered:

jbarlow83 · 2024-04-08T20:22:30Z

Surprisingly both the PDF internal date spec (PDFmark) and XMP both allow reduced precision dates.

It looks using pendulum.Interval would make it possible to round-trip reduced precision dates, with the interval set to the time period covered, e.g. 2023 would become the interval 2023-01-01 (inclusive) through 2024-01-01 (exclusive). That would allow a consistent representation that could distinguish between the year 2023 and 2023-01-01. Then encode_pdf_date and decode_pdf_date would have to learn about pendulum's datetime and interval classes, without breaking backward compatibility.

Arrow and Python standard library (along with most other software) will render a date like "2023" as "2023-01-01".

I can't say this issue is high priority from my perspective, and it will be fussy, but PRs are welcome if you want to see it tackled sooner.

devnoname120 mentioned this issue Apr 7, 2024

Just replace the delimiters in raw date devnoname120/google-play-book-downloader#14

Merged

jbarlow83 added the bug label Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`docinfo_from_xmp()` fails on reduced precision dates (`YYYY` / `YYYY-MM`) #576

`docinfo_from_xmp()` fails on reduced precision dates (`YYYY` / `YYYY-MM`) #576

devnoname120 commented Apr 7, 2024

jbarlow83 commented Apr 8, 2024

docinfo_from_xmp() fails on reduced precision dates (YYYY / YYYY-MM) #576

docinfo_from_xmp() fails on reduced precision dates (YYYY / YYYY-MM) #576

Comments

devnoname120 commented Apr 7, 2024

Investigation

How to reproduce

Example 1

Example 2

jbarlow83 commented Apr 8, 2024

`docinfo_from_xmp()` fails on reduced precision dates (`YYYY` / `YYYY-MM`) #576

`docinfo_from_xmp()` fails on reduced precision dates (`YYYY` / `YYYY-MM`) #576