Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception thrown when setting trailer info element to null #432

Open
cacowen opened this issue Mar 26, 2024 · 4 comments
Open

Exception thrown when setting trailer info element to null #432

cacowen opened this issue Mar 26, 2024 · 4 comments

Comments

@cacowen
Copy link

cacowen commented Mar 26, 2024

When opening specific documents I get an exception:
{"Value cannot be null. (Parameter 'value')"}

Unfortunately, these documents come from an external source, and all of them have this issue. As a work around I have to manually open the pdf in acrobat or something and then save it (without doing anything). This seems to add something to the "INFO" where it does not throw an exception. I would like to be able to still open the file in code.

Stack trace:

at PdfSharpCore.Pdf.PdfDictionary.DictionaryElements.set_Item(String key, PdfItem value) in PdfSharpCore.Pdf\PdfDictionary.cs:line 49

at PdfSharpCore.Pdf.Advanced.PdfTrailer.Finish() in PdfSharpCore.Pdf.Advanced\PdfTrailer.cs:line 158

at PdfSharpCore.Pdf.IO.PdfReader.Open(Stream stream, String password, PdfDocumentOpenMode openmode, PdfPasswordProvider passwordProvider, PdfReadAccuracy accuracy) in PdfSharpCore.Pdf.IO\PdfReader.cs:line 380

This happens in this code when trying to set the info element to null:
https://github.com/ststeiger/PdfSharpCore/blob/cdf089b6c4d6b379aead95f463911dd009ae194e/PdfSharpCore/Pdf.Advanced/PdfTrailer.cs#L191C13-L198C14

iref = _document._trailer.Elements[PdfTrailer.Keys.Info] as PdfReference;
if (iref != null && iref.Value == null)
{
    iref = _document._irefTable[iref.ObjectID]; // <-- this comes back as `null`
    Debug.Assert(iref.Value != null);
    _document._trailer.Elements[Keys.Info] = iref; // <-- this causes the exception
}

Expected behavior:

not to crash - allow setting the value to null or skip setting the element if the value is null

@GeorgRottensteiner
Copy link

GeorgRottensteiner commented Apr 23, 2024

Running into the exact same issue. Since my usage is automated extraction manual repairing of the file is not an option.

Edit: Simple skipping setting the reference does not help fully for me. The document ends up with zero pages, although shows up fine with any other PDF viewer. Unfortunately I cannot share the document as it contains private data.

@StLange
Copy link

StLange commented May 25, 2024

It seems that the PDF is formatted incorrectly and Acrobat can fix it.
We would like to fix this, but without the PDF file it is not possible. Please send the file to “issues (at) pdfsharp.net”
We keep the PDF file secret and only use it to fix the bug.

@cacowen
Copy link
Author

cacowen commented May 27, 2024

File has been sent. Issue still happens in PdfSharpCore. The file can be read in PdfSharp, PdfPig, and others. Thank you.

@StLange
Copy link

StLange commented May 27, 2024

I have received the file. The reason for the issue is that the reference to object 312 is mentioned twice in the file. In PDFsharp (not in PdfSharpCore) I fixed this by removing the first entry when an identical second entry occurred. See original source code from PDFsharp 6.1 below.

In PdfSharpCore just call ObjectTable.Remove(iref.ObjectID) if the object is already in the table. This is line 75 in PdfCrossReferenceTable.cs in PdfSharpCore. I did not test it, but I’m pretty sure that it works.

        /// <summary>
        /// Adds a cross-reference entry to the table. Used when parsing the trailer.
        /// </summary>
        public void Add(PdfReference iref)
        {
            if (iref.ObjectID.IsEmpty)
                iref.ObjectID = new(GetNewObjectNumber());

            // ReSharper disable once CanSimplifyDictionaryLookupWithTryAdd because it would not build with .NET Framework.
            if (ObjectTable.ContainsKey(iref.ObjectID))
            {
#if true_
                // Really happens with existing (bad) PDF files.
                // See file 'Detaljer.ARGO.KOD.rev.B.pdf' from https://github.com/ststeiger/PdfSharpCore/issues/362
                throw new InvalidOperationException("Object already in table.");
#else
                // We remove the existing one and use the latter reference.
                // HACK: This is just a quick fix that may not be the best solution in all cases.
                // On GitHub user packdat provides a PR that orders objects. This code is not yet integrated,
                // because releasing 6.1.0 had a higher priority. We will fix this in 6.2.0.
                // However, this quick fix is better than throwing an exception in all cases.
                PdfSharpLogHost.PdfReadingLogger.LogError("Object '{ObjectID}' already exists in xref table. The latter one is used.", iref.ObjectID);
                ObjectTable.Remove(iref.ObjectID);
#endif
            }
            ObjectTable.Add(iref.ObjectID, iref);
        }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants