Skip to content

huyndao/redact_pdf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Motivation

I recently had to submit my academic transcripts along with a job application.
The company requires that PII (Personally Identifiable Information) to be redacted since they cannot guarantee that such info will be properly guarded. As such, I need a way to redact PDF files that is secure and irreversible.

Prior Arts

I looked into several options however they were not ideal for my needs:

  • Some were overlaying a box onto the PDF, which can be potentially be undone by many means (extracting and viewing PDF source)
  • Cost / Non-free
  • Proprietary: not sure what goes on under the hood
  • SaaS: requires uploading your PII contained PDF onto the cloud service in order to redact (helllll... no)

There is a good option in Dangerzone. It is an open source project created by Micah Lee, a software developer @ the Intercept. It is a GUI tool and can convert many file types (not just PDF) to a safe and redacted PDF and worth checking out. Even if you don't plan on using it, it still is worth reading the above Dangerzone page.

Script

However, for my needs Dangerzone was not necessary and I hate firing up Docker just to run an app, so I wrote this quick and dirty shell script to use open source softwares to do similar things.

It uses the following OpenSource apps, which you may or may not already have installed:

How it works

Essentially, it

  • decrypts the PDF (if encrypted with an owner password)
  • splits it into pages
  • converts each split into a 600 dpi by 600 dpi PNG image
  • opens up Gimp for each image and waits for you to redact / draw black boxes on your sensitive information
  • after you're done, just save the split image file (overwriting it)
  • then it converts each redacted split file back into PDF
  • and merges them into a whole PDF file with name: [original_name_here]_redacted.pdf
  • That's it!