Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for PDF/A-1b #34

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

Backbone81
Copy link

This pull request complements prawnpdf/prawn#1029

.gitignore Outdated
@@ -1,2 +1,3 @@
Gemfile.lock
coverage/*
/.idea
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go into your global .gitignore

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

info: options[:info]
}
store_params[:print_scaling] = options[:print_scaling] if options[:print_scaling]
store_params[:enable_pdfa_1b] = options[:enable_pdfa_1b] if options[:enable_pdfa_1b]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit repetitive. Maybe we can use options.select here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

render_xmp if @xmp_creator_tool || @xmp_create_date || @xmp_modify_date
render_pdf if @pdf_keywords || @pdf_producer
render_dc if @dc_title || @dc_creator || @dc_description
@xml_doc.root.to_xml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admire your perseverance but it feels like this whole class could be replaced by a string interpolation. It would be simpler, faster, and would spare us a dependency.

Speaking of which… Prawn is pure Ruby. All its deps are pure Ruby. It's one of our selling points. means that Nokogiri is not going to get added to Prawn deps. If you're convinced you must use it for this feature you should release it in its own gem instead. We'll be happy to to help you with integration.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have to use nokogiri. It was my first shot at the problem because I use it elsewhere. I accept that no external dependencies should be added and will look into changing it to string interpolations.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -19,6 +19,18 @@ def initialize(opts = {})

@info ||= ref(opts[:info] || {}).identifier
@root ||= ref(Type: :Catalog).identifier

# PDF/A-1b requirement: XMP metadata
@xmp_metadata ||= ref(Type: :Metadata, Subtype: :XML).identifier
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like this to be optional. The documents we generate now should not get bigger for no reason with new version of Prawn. Let's include this only in documents that actually have some metadata.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. The XMP metadata is now only added to PDF/A-1b documents.

@@ -0,0 +1,10 @@
ICC sRGB v2 Profile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The profile is not code. Please put it in the data directory and please don't forget to add it to the packaged gem as well.

Also as far as I can tell copyright information and profile name is included in the profile itself. The license doesn't seem to require this file distribution along with the profile so maybe drop it. Include the links in commit message for history. That should be enough.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -216,10 +217,12 @@ def render_xref(output)
# Write out the PDF Trailer, as per spec 3.4.4
#
def render_trailer(output)
trailer_id = PDF::Core::ByteString.new(SecureRandom.random_bytes(16))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make is deterministic. We're trying to reduce random changes to the generated documents.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With 'deterministic' are you thinking of hashing some document content? Because the PDF spec (10.3 File Identifiers) suggest to use the current time beside some other data. But that would not solve the problem with the idempotent test you mentioned somewhere else.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the time is mentioned. But it doesn't seem like a good indicator of content stability. After all, are two documents different if their content is identical? Also a bit earlier in that chapter one can find this:

The first byte string is a permanent identifier based on the contents of the file at the time it was originally created and does not change when the file is incrementally updated. The second byte string is a changing identifier based on the file’s contents at the time it was last updated. When a file is first written, both identifiers are set to the same value.

I would just hash the body. It would work great for our purpose. Also since Prawn doesn't really support incremental updates we can reuse the value for the second part of ID.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking at getting both PDF/A-1b support in, and updating doc encryption to modern levels. In the process, I found that there's a potential issue here that will likely force a design decision for Prawn.

Specifically, the first trailer[:ID] value is used in calculating the encryption key for the PDF document. This is missing from the current Prawn implementation (which always assumes an empty :iD value), but is clearly laid out in the spec. So this ID value needs to be present when the encryption dictionary is built (currently when encrypt_document is called) and invariant thereafter.

My current favored approach is:

  1. Calling encrypt_document should initialize the trailer[:iD] value in the exact same way as the PDF/A-1b code does, with the caveat that they may not wind up with the same value if the state of the document is altered between the two calls. Also, need to confirm that the current method for PDF/A-1b excludes an encryption dictionary from the ID calculation.
  2. The PDF/A-1b code only sets trailer[:ID] if it isn't already set. Otherwise it uses the already present value

I think this is simple, involves the minimum change for Prawn and PDF-Core, and is consistent with the spec intentions.

Other options might include moving all encryption_dictionary calculation to the render phase or requiring that encrypt_document be the last call to a document before rendering. On the face of it both of those seem less desirable so I haven't yet thought through the implications.

@pointlessone Thoughts?

@petergoldstein
Copy link
Contributor

@pointlessone what's your current thinking on this PR? Your concerns appear to have been addressed and it (combined with the matching prawn PR) seems like a significant value add. Would it make sense to rebase and see where it stands?

@Backbone81 Not sure if you're engaged on this anymore (it's been a while) but if you have any thoughts they'd be appreciated.

@Backbone81
Copy link
Author

@petergoldstein It has been quite some time. I did need that feature for a project I was working on five years ago and finished my project with a prawnpdf fork containing the current state from those two merge requests. I have since then moved on to other projects and do not have access to the ISO specifications for PDF and PDF/A-1b any more.

I would be fine with somebody else taking my changes and integrating them into the current state of prawnpdf. I think it would be a valuable feature for applications in the financial and insurance industry which do have a regulatory need for standards like PDF/A-1b.

@petergoldstein
Copy link
Contributor

@Backbone81 Let me see if I can rebase this and then we can get @pointlessone 's feedback. Thanks for the initial contribution.

@jlduran
Copy link

jlduran commented Feb 11, 2022

@Backbone81 Let me see if I can rebase this and then we can get @pointlessone 's feedback. Thanks for the initial contribution.

We have been using this patch for a while now, with a few minor tweaks:

Labtec@83606ea

EDIT: We used veraPDF for PDF/A-1b validation.

Copy link
Member

@pointlessone pointlessone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have to check if I have a copy of PDF/A standard and if these changes represent the standard sufficiently.

Comment on lines +29 to +30
xmp_metadata.stream = Stream.new
xmp_metadata.stream << xmp_metadata_content.render
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xmp_metadata is a reference. It's smart enough to create a stream when needed so this can be a bit simpler:

Suggested change
xmp_metadata.stream = Stream.new
xmp_metadata.stream << xmp_metadata_content.render
xmp_metadata << xmp_metadata_content.render

icc_profile_name = 'sRGB2014.icc'.freeze

icc_profile_stream = ref(N: 3)
icc_profile_stream.stream = Stream.new
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants