Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: individual /Resources directory are now properly produced for each document page #1133

Merged
merged 1 commit into from
May 27, 2024

Conversation

Lucas-C
Copy link
Member

@Lucas-C Lucas-C commented Mar 6, 2024

This change causes a slight increase of PDF documents, but makes their structure more valid regarding the PDF spec.

Checklist:

  • The GitHub pipeline is OK (green), meaning that both pylint (static code analyzer) and black (code formatter) are happy with the changes of this PR.

  • A unit test is covering the code added / modified by this PR

  • This PR is ready to be merged

  • In case of a new feature, docstrings have been added, with also some documentation in the docs/ folder

  • A mention of the change is present in CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the GNU LGPL 3.0 license.

@Lucas-C
Copy link
Member Author

Lucas-C commented Mar 6, 2024

Could you please review this @andersonhc or @gmischler?

Despite the 100+ reference PDF files modified, this PR onlys affects 2 source Python files and is relatively short.

@Lucas-C Lucas-C force-pushed the individual-page-resources branch 14 times, most recently from b9fdf5e to af159f9 Compare March 7, 2024 07:51
@Lucas-C
Copy link
Member Author

Lucas-C commented Mar 7, 2024

There are some alternative approaches to the one currently used in this PR:

  • instead of 3 extra dicts (.fonts_used_per_page_number / .images_used_per_page_number / .graphics_style_names_per_page_number), we could store those informations as properties in PDFPage instances (in FPDF.pages)
  • instead of storing this information at all, we could generalize the regex-based approach used in FPDF.drawing_context() and perform those regex-matches in OutputProducer based on the final .contents of pages

@andersonhc
Copy link
Collaborator

I will try to make some time for a complete review by tomorrow.
For now I am running some tests and it's interesting your PR makes it clear we are letting unused fonts end up in the output document.

An example:

from fpdf import FPDF

pdf = FPDF()
pdf.add_font("NotoSans", "B", "NotoSans-Bold.ttf")
pdf.add_font("NotoSans", "BI", "NotoSans-BoldItalic.ttf")
pdf.add_font("NotoSans", "I", "NotoSans-Italic.ttf")
pdf.add_font("NotoSans", "", "NotoSans-Regular.ttf")

pdf.set_font("NotoSans", "", 12)
pdf.single_resources_object = False

pdf.add_page()
pdf.multi_cell(w=pdf.epw, text="**Text in bold**", markdown=True)

pdf.add_page()
pdf.multi_cell(w=pdf.epw, text="__Text in italic__", markdown=True)

pdf.output("test1.pdf")

The page 1 will have "F1" and "F4" in the resources dictionary, and page 2 will have "F3" and "F4".
"F2" is added on the final document and not referenced at all. "F4" is added to the resource list because of set_font() although not used.

I have documents with many fallback fonts and there is a considerable amount of unused font data added to the documents.

Might be something to tackle in a future PR.

@andersonhc andersonhc self-requested a review March 10, 2024 11:30
CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@andersonhc andersonhc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have anything else to add. The changes are simple and clear.
I didn't have time to visually inspect all the changed reference files but I did a few and everything looks OK.

fpdf/output.py Show resolved Hide resolved
@Lucas-C Lucas-C force-pushed the individual-page-resources branch from c123b0e to f7229d3 Compare May 27, 2024 15:37
@Lucas-C Lucas-C force-pushed the individual-page-resources branch from f7229d3 to 9b81457 Compare May 27, 2024 15:46
@Lucas-C Lucas-C merged commit 5f2cc81 into master May 27, 2024
13 checks passed
@Lucas-C Lucas-C deleted the individual-page-resources branch May 27, 2024 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants