Convert 600dpi pdf to 150dpi pdf #3983

artofit · 2021-07-25T11:26:40Z

artofit
Jul 25, 2021

Hi,
I scanned a paper document in 600dpi
The result colour file in.pdf is 8mb

If I run:
convert -density 150 "in.pdf" out.pdf
out.pdf is twice bigger

If I run:
convert -density 150 "in.pdf" -quality 80 -scene 1 dpi150q80-%3d.jpg
then
convert *.jpg -auto-orient allJpegsIntoSingleFile.pdf
allJpegsIntoSingleFile.pdf is 1.8mb

Questions:
1/ How to convert the 600dpi to 150dpi at once?
What's wrong with the first command, namely convert -density 150 "in.pdf" out.pdf

2/ convert -density 150 "in.pdf" -quality 80 -scene 1 dpi150q80-%3d.jpg
Does a less better job in terms of jpegs when one zooms in, than
pdftoppm -jpeg "in.pdf" outJpegs

Eventhough, same algo, same colorspace and even quality is 75
What can I do to enhance the output of convert?

Thanks

snibgo · 2021-07-25T13:23:21Z

snibgo
Jul 25, 2021

You are usig PDF as a wrapper for a raster image. The raster image was scanned at 600 dpi, but the PDF itself is only 72 dpi. You can verify this with "magick yourfile.pdf -verbose info:"

As you have discovered, using "-density" before reading the PDF affects the size, ie how many pixels you get. If you use 72 dpi, there will be no change. But you want to reduce by a factor of 600/150=4 linearly. So 72/4 = 18. Use "-density 18".

1 reply

artofit Jul 25, 2021
Author

magick yourfile.pdf -verbose info:
Indeed, amongst other I see:
Geometry: 577x822+0+0 Resolution: 72x72 Print size: 8.01389x11.4167 Units: Undefined Colorspace: sRGB Type: TrueColorAlpha Base type: Undefined Endianness: Undefined Depth: 16/8-bit Channel depth: Red: 8-bit Green: 8-bit Blue: 8-bit Alpha: 1-bit Channel statistics: Pixels: 474294

convert -density 18
But then the output is no more readable(ultra pixelized)

What I do not understand are:
1/ if I scan in 300dpi, I get the same info(resolution, geometry, etc), but the number of pixels which are slightly different.

2/ convert -density 150 "in.pdf" -quality 80 -scene 1 dpi150q80-%3d.jpg
then
convert *.jpg -auto-orient allJpegsIntoSingleFile.pdf
does a better job.

3/ convert -density 150 in.dpf quality 80 -scene 1 dpi150q80-%3d.jpg
output jpegs are 1203 x 1702

snibgo · 2021-07-25T18:57:13Z

snibgo
Jul 25, 2021

I don't know what you are trying to do, so advice is difficult.

PDF files can contain multiple images, each with different densities (dpi). And the overall PDF has its own density.

For this reason and others, PDFs are not a convenient format for image processing. When there is one raster image per PDF, the PDF wrapper is entirely pointless. It just adds a layer of complexity, and has no benefit.

When I scan documents, I always save images in raster image files, never as PDF files.

ImageMagick is not concerned with raster images that are inside PDF wrappers. Instead, IM is concerned with PDF pages. When IM reads a PDF, it converts each page to a raster image. It doesn't matter if the page happens to contain text or vector data or multiple raster images, IM will create a single raster image from that page.

When I receive a PDF file and I want the raster images it contains, I extract the images with "pdfimages", a common tool. I don't use IM to rasterize the pages because I care about the embedded raster images, not about the pages.

Similarly, I don't create PDF files that contain just raster images, because there is no point.

0 replies

artofit · 2021-07-25T20:05:57Z

artofit
Jul 25, 2021
Author

Interesting.

For this reason and others, PDFs are not a convenient format for image processing.

Not targeted for editing

What I have achieved:
put a bunch(>1000) A4 pages as printed text into a scanner that can swallow at least 60pages/min
those as scanned as images, no OCR, and results into 1 file as pdf per scan paper doc.
Some scans, I want them in colours 600dpi, other sufficient in gray 150dpi

What I want to achieve:
600dpi in colours scans are targeted to be kept as archive.
However, to send/access the document, I wish them in 150dpi to save bandwidth and third-party storage.

So, how I can batch convert a high dpi pdf in which every page is a raster image with same scan parameter, into a lower resolution.

Method1:
IM can do it in 1 command line?

Method2:
for every pdf
convert each page into a 150dpi jpg
then
assemble all these into a pdf

convert -density 150 "in.pdf" -quality 80 -scene 1 dpi150q80-%3d.jpg
convert *.jpg -auto-orient allJpegsIntoSingleFile.pdf

This can do the job, then it needs some programming/scripting to batch it.

However, I was surprised to compare
pdftoppm -jpeg "in.pdf" outJpegs
with
convert -density 150 "in.pdf" -quality 80 -scene 1 dpi150q80-%3d.jpg
I was surprised to see that pdftoppm provides a better output, eventhough quality is 75.
So I was wondering, if I do something wrong.

When I receive a PDF file and I want the raster images it contains, I extract the images with "pdfimages", a common tool. I don't use IM to rasterize the pages because I care about the embedded raster images, not about the pages.

Except if I miss understood, this extracts the original images and won't compress. Why would IM do a lesser job?

Thanks

1 reply

fmw42 Jul 25, 2021
Collaborator

You should get pdfimages program and extract the image that you scanned from the pdf and find out what format that is. It could have been a NetPBM format already. There is no point in creating a high quality PDF only to rasterize it. You should scan at high density and save as (listlessly) compressed TIFF or PNG. Use that for archive and then create your low quality result by resizing in ImageMagick to JPG.

snibgo · 2021-07-26T01:14:34Z

snibgo
Jul 26, 2021

I do not suggest reading the PDFs with IM. If you do that, then Ghostscript (as the delegate of IM) will resample the raster images. Resampling will give worse quality than simply extracting the raster images.

I was surprised to see that pdftoppm provides a better output, ...

Don't be surprised. pdftoppm is a similar program to pdfimages, from the same Poppler family.

I suggest you extract the raster images with pdfimages (or pdftoppm, if you prefer). Then resize each image with "magick in.ext -resize 25% out.ext". The output can be a PDF if you want. To assemble them into a single PDF, and you have enough memory for all the inputs you can "magick in*.ext -resize 25% out.pdf".

Why would IM do a lesser job?

IM doesn't do a lesser job. It does a different job. IM rasterizes PDF pages. That isn't the task you want. The task you want is to extract raster images from PDF files.

1 reply

artofit Jul 26, 2021
Author

I didn't know (REF: https://www.scivision.dev/extracting-raw-image-from-pdf/)
to know the raw encoding of every page in a pdf:
pdfimages -list in.pdf

to extract all raw images from PDF:
pdfimages -all in.pdf outRaw

IM doesn't do a lesser job. It does a different job.
Indeed

Thanks

joaqueendex · 2024-04-08T16:02:50Z

joaqueendex
Apr 8, 2024

hi everyone,
i have a similar question.
i need to convert a pdf file to 'scanned' pdf, so i use this command:

magick convert -density 300 "d:\PDFFolder
Source.pdf" -alpha remove -rotate 0.33 -attenuate 0.15 +noise Multiplicative +repage -monochrome -compress group4 "D:\PDFFolder
Output.pdf"

and i get a good result for black and white pdfs, but when i need to convert colour pdf, file size increases many times.
for example, colour copy of 1.3 KB file is 45MB. black-and-white copy is 700KB.

but when i convert file to jpeg and then back to pdf, file size barely changes.

how do i create scanned pdf version without increasing file size?

0 replies

snibgo · 2024-04-08T18:48:51Z

snibgo
Apr 8, 2024

Why are you using magick convert? That limits you to the old v6 syntax. I suggest using just magick (without convert) to get the more modern v7 synta.

If your input PDF contains just one raster image, and that is all you want, I suggest extracting that with pdfimages, not with IM.

1 reply

joaqueendex Apr 9, 2024

thank you for replying.
actually I have different files, now I'm working with an 18-page pdf, which contains both text and pictures

to be honest, this is my first time using this software and I was not aware that there is a newer version of syntax, I googled the commands. is there any way to solve my problem in the new version?..

snibgo · 2024-04-09T13:57:42Z

snibgo
Apr 9, 2024

If your PDF contains text (as vector elements, not raster), and you want that text in your result, then IM is a good tool.

To get the "new" syntax, just remove convert from your command.

Vector text doesn't take much space in the PDF. For example "hello" takes maybe 5 or 10 bytes. IM will convert this to raster (pixels), and that string might need 500 or 5000 pixels, which takes more space in the file.

1 reply

joaqueendex Apr 9, 2024

i need whole file with effect of scanned document.
if i just remove 'convert' from my command, result is the same.

snibgo · 2024-04-09T17:50:45Z

snibgo
Apr 9, 2024

i need whole file with effect of scanned document.

I don't understand what that means.

0 replies

joaqueendex · 2024-04-10T14:44:27Z

joaqueendex
Apr 10, 2024

this is processing a document so that it looks like you took a printed document and scanned it

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert 600dpi pdf to 150dpi pdf #3983

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 9 comments 5 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Convert 600dpi pdf to 150dpi pdf #3983

Replies: 9 comments · 5 replies

artofit Jul 25, 2021 Author

artofit Jul 25, 2021 Author

fmw42 Jul 25, 2021 Collaborator

artofit Jul 26, 2021 Author

Replies: 9 comments 5 replies

artofit Jul 25, 2021
Author

artofit
Jul 25, 2021
Author

fmw42 Jul 25, 2021
Collaborator

artofit Jul 26, 2021
Author