Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Font objects syntax error while merging a PDF document with another #548

Closed
sagar-kalburgi-ripcord opened this issue Mar 14, 2024 · 17 comments

Comments

@sagar-kalburgi-ripcord
Copy link

Description

Hi, when I use unipdf to merge the attached PDF file with another PDF file, it throws this error
one of the font objects syntax is not valid - BaseFont undefined: Dict(\"BaseFont\": DejaVuSans, \"CharProcs\": IObject:567, \"Encoding\": Dict(\"Differences\": [46, period, 48, zero, one, two, three, four, five, six, seven, eight, nine, 75, K, 77, M, 97, a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, 119, w, 121, y], \"Type\": Encoding, ), \"FirstChar\": 0, \"FontBBox\": [-1021, -463, 1794, 1233], \"FontDescriptor\": IObject:604, \"FontMatrix\": [0.001000, 0, 0, 0.001000, 0, 0], \"LastChar\": 255, \"Name\": DejaVuSans, \"Subtype\": Type3, \"Type\": Font, \"Widths\": IObject:605, )

But Adobe reader and Chrome PDF reader are able to render the PDF document without reporting any font related issues at all. So not sure why only unipdf is running into this.
It may be that the document itself has the font configured incorrectly, but Adobe reader and Chrome have no problem rendering it correctly at all.

Expected Behavior

Unipdf needs to handle the merge seamlessly.

Actual Behavior

Use unipdf merge functionality using the attached PDF file and another PDF file of your choice to reproduce the error.

Attachments

S19-1026-NLP-Tasks.pdf

@sampila
Copy link
Collaborator

sampila commented Mar 14, 2024

Hi @sagar-kalburgi-ripcord,

we tried to reproduce the issue, but when trying to merge the S19-1026-NLP-Tasks.pdf with this our sample pdf document-header-and-footer-simple, it's works fine.

Likely it's due your system doesn't have DejavuSans font installed.

Example Code

/*
 * Basic merging of PDF files.
 * Simply loads all pages for each file and writes to the output file.
 * See pdf_merge_advanced.go for a more advanced version which handles merging document forms (acro forms) also.
 *
 * Run as: go run pdf_merge.go output.pdf input1.pdf input2.pdf input3.pdf ...
 */

package main

import (
	"fmt"
	"os"

	"github.com/unidoc/unipdf/v3/common/license"
	"github.com/unidoc/unipdf/v3/model"
)

func init() {
	// Make sure to load your metered License API key prior to using the library.
	// If you need a key, you can sign up and create a free one at https://cloud.unidoc.io
	err := license.SetMeteredKey(os.Getenv(`UNIDOC_LICENSE_API_KEY`))
	if err != nil {
		panic(err)
	}
}

func main() {
	if len(os.Args) < 4 {
		fmt.Printf("Requires at least 3 arguments: output_path and 2 input paths\n")
		fmt.Printf("Usage: go run pdf_merge.go output.pdf input1.pdf input2.pdf input3.pdf ...\n")
		os.Exit(0)
	}

	outputPath := ""
	inputPaths := []string{}

	// Sanity check the input arguments.
	for i, arg := range os.Args {
		if i == 0 {
			continue
		} else if i == 1 {
			outputPath = arg
			continue
		}

		inputPaths = append(inputPaths, arg)
	}

	err := mergePdf(inputPaths, outputPath)
	if err != nil {
		fmt.Printf("Error: %v\n", err)
		os.Exit(1)
	}

	fmt.Printf("Complete, see output file: %s\n", outputPath)
}

func mergePdf(inputPaths []string, outputPath string) error {
	pdfWriter := model.NewPdfWriter()

	for _, inputPath := range inputPaths {
		pdfReader, f, err := model.NewPdfReaderFromFile(inputPath, nil)
		if err != nil {
			return err
		}
		defer f.Close()

		numPages, err := pdfReader.GetNumPages()
		if err != nil {
			return err
		}

		for i := 0; i < numPages; i++ {
			pageNum := i + 1

			page, err := pdfReader.GetPage(pageNum)
			if err != nil {
				return err
			}

			err = pdfWriter.AddPage(page)
			if err != nil {
				return err
			}
		}
	}

	fWrite, err := os.Create(outputPath)
	if err != nil {
		return err
	}

	defer fWrite.Close()

	err = pdfWriter.Write(fWrite)
	if err != nil {
		return err
	}

	return nil
}

Sample file

document-header-and-footer-simple.pdf

Command to run

go run main.go output.pdf document-header-and-footer-simple.pdf S19-1026-NLP-Tasks.pdf 

Output PDF

output.pdf

Could you try install the DejavuSans font and run the code?

@rcosta-ripcord
Copy link

Hi @sampila,
First of all, thank you for providing suggestions and the sample code!

I've been working with @sagar-kalburgi-ripcord on this, and I tried installing the font, which didn't solve the issue on our service.

Using the code you provided, it does work, indeed.
However, we are trying to ensure PDF/A compatibility, and as such, I made a small change to your code. The same error persists after my changes, and I confirmed that the font was installed!

Given the above, I have a few questions:

  • Do you know if the PDF/A Profile has any limitations, or if there's any way to solve this issue?
  • Do you know if there are any other kind of requirements in terms of fonts that we need to ensure?
  • If DejavuSans in particular is the issue, can we specify or override the fallback fonts?

For reference, here's the code with the changes I mentioned:

/*
 * Basic merging of PDF files.
 * Simply loads all pages for each file and writes to the output file.
 * See pdf_merge_advanced.go for a more advanced version which handles merging document forms (acro forms) also.
 *
 * Run as: go run pdf_merge.go output.pdf input1.pdf input2.pdf input3.pdf ...
 */

package main

import (
	"fmt"
	"os"

	"github.com/unidoc/unipdf/v3/common/license"
	"github.com/unidoc/unipdf/v3/model"
	"github.com/unidoc/unipdf/v3/model/pdfa"
)

func init() {
	// Make sure to load your metered License API key prior to using the library.
	// If you need a key, you can sign up and create a free one at https://cloud.unidoc.io
	err := license.SetMeteredKey(os.Getenv(`UNIDOC_LICENSE_API_KEY`))
	if err != nil {
		panic(err)
	}
}

func main() {
	if len(os.Args) < 4 {
		fmt.Printf("Requires at least 3 arguments: output_path and 2 input paths\n")
		fmt.Printf("Usage: go run pdf_merge.go output.pdf input1.pdf input2.pdf input3.pdf ...\n")
		os.Exit(0)
	}

	outputPath := ""
	inputPaths := []string{}

	// Sanity check the input arguments.
	for i, arg := range os.Args {
		if i == 0 {
			continue
		} else if i == 1 {
			outputPath = arg
			continue
		}

		inputPaths = append(inputPaths, arg)
	}

	err := mergePdf(inputPaths, outputPath)
	if err != nil {
		fmt.Printf("Error: %v\n", err)
		os.Exit(1)
	}

	fmt.Printf("Complete, see output file: %s\n", outputPath)
}

func mergePdf(inputPaths []string, outputPath string) error {
	pdfWriter := model.NewPdfWriter()
	
	// Apply PDF/A-1a Standard with default options
	pdfWriter.ApplyStandard(model.StandardApplier(pdfa.NewProfile1A(pdfa.DefaultProfile1Options())))

	for _, inputPath := range inputPaths {
		pdfReader, f, err := model.NewPdfReaderFromFile(inputPath, nil)
		if err != nil {
			return err
		}
		defer f.Close()

		numPages, err := pdfReader.GetNumPages()
		if err != nil {
			return err
		}

		for i := 0; i < numPages; i++ {
			pageNum := i + 1

			page, err := pdfReader.GetPage(pageNum)
			if err != nil {
				return err
			}

			err = pdfWriter.AddPage(page)
			if err != nil {
				return err
			}
		}
	}

	fWrite, err := os.Create(outputPath)
	if err != nil {
		return err
	}

	defer fWrite.Close()

	err = pdfWriter.Write(fWrite)
	if err != nil {
		return err
	}

	return nil
}

@sampila
Copy link
Collaborator

sampila commented Mar 15, 2024

Hi @rcosta-ripcord thanks for providing more detail regarding this.

We investigate this issue.

@sampila
Copy link
Collaborator

sampila commented Mar 17, 2024

Hi @rcosta-ripcord and @sagar-kalburgi-ripcord,

We are trying some experiment on PDF/A process, we tried to use standard font available when couldn't get the embedded font from PDF, here's the current results.

PDF Result

output1.pdf

What do you think, do the results acceptable and not affecting your current use case?

@rcosta-ripcord
Copy link

Hi @sampila, would it be possible to show what the result would look like with the document @sagar-kalburgi-ripcord attached?
I'd also like to know if there's any PR available we can test with

@sampila
Copy link
Collaborator

sampila commented Mar 18, 2024

Hi @sampila, would it be possible to show what the result would look like with the document @sagar-kalburgi-ripcord attached?
I'd also like to know if there's any PR available we can test with

Hi, the output1.pdf is from the S19-1026-NLP-Tasks.pdf

I will create PR for this specific issue.

@sagar-kalburgi-ripcord
Copy link
Author

@sampila Sounds good. We can test against your PR and let you know if it works out for us. Thanks!

@sampila
Copy link
Collaborator

sampila commented Mar 18, 2024

Hi @sagar-kalburgi-ripcord and @rcosta-ripcord I created the PR and mentioned this issue on PR, could you check that?

@sagar-kalburgi-ripcord
Copy link
Author

Hi @sampila we were unable to find any PR linked to this issue. Could you pls post a link to it here?

@sampila
Copy link
Collaborator

sampila commented Mar 19, 2024

Hi @sampila we were unable to find any PR linked to this issue. Could you pls post a link to it here?

The PR can be accessed through ripcord account that has been added into unipdf source code repository, you can access the PR using that account.

@sagar-kalburgi-ripcord
Copy link
Author

Hi @sampila, neither of us are able to find any PR although both of us are logged into our Ripcord account on Github

@sampila
Copy link
Collaborator

sampila commented Mar 20, 2024

Hi @sagar-kalburgi-ripcord could you check again? you account should having access to the PR already.
You can fork that.

@sagar-kalburgi-ripcord
Copy link
Author

Hi @sampila. I got access to your Org, however is it possible to add @rcosta-ripcord to your Org as well? he is actively testing these changes right now.

@sampila
Copy link
Collaborator

sampila commented Mar 20, 2024

Hi @sampila. I got access to your Org, however is it possible to add @rcosta-ripcord to your Org as well? he is actively testing these changes right now.

Regarding that, @rcosta-ripcord can fork from your forked repo, as currently we are giving the access to 1 member of organization only.

@rcosta-ripcord
Copy link

Hi @sampila, @sagar-kalburgi-ripcord and I just tested your PR and it does fix our issue.
Please let us know once you merge and release it so we can update the dependency on our services!

Thank you for your help!

@sampila
Copy link
Collaborator

sampila commented Mar 23, 2024

Hi @sampila, @sagar-kalburgi-ripcord and I just tested your PR and it does fix our issue. Please let us know once you merge and release it so we can update the dependency on our services!

Thank you for your help!

Hi @rcosta-ripcord, thanks for confirmation, we are adding this issue into our test cases and preparing new UniPDF release.
Will notify you after the release

@sampila
Copy link
Collaborator

sampila commented Mar 28, 2024

Hi @sagar-kalburgi-ripcord and @rcosta-ripcord,

We released new UniPDF version to fix this issue https://github.com/unidoc/unipdf/releases/tag/v3.56.0

We are closing this issue for now and you can re-open the issue if at latest version not resolve this issue.

Best regards,
Alip

@sampila sampila closed this as completed Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants