GitHub - sidmishraw/pdf_processor: IEEE Xplore PDFs to JSON conversion utility

IEEE Xplore PDF - JSON converter

Motivation and Objective

The PDF document source, unlike markup langauges is diffuclt to comprehend when read in using an application. It is difficult to extract words from the PDF's source. This utility tries to tackle this problem of extracting words from the PDF document's source and converting it into a JSON.

The JSON is going to have the following structure:

{
  "Page#1": [
    "Observational",
    "Calculi",
    "Classes",
    "of",
    "Association",
    "Rules",
    "and",
    "F-property"
    ]
}

Page#1 represents the page of the PDF document and it's value is the list of words occuring in page 1 of the PDF document.

Note: At the moment, this utility only focusses on the PDF documents found on IEEE's Xplore website.

Note: This utility makes use of PDFMiner utility made by Yusuke Shinyama.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
pdf_processing		pdf_processing
.gitignore		.gitignore
Readme.md		Readme.md
pdf_downloader_script.py		pdf_downloader_script.py
pdf_processing_demo.py		pdf_processing_demo.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pdf_processing

pdf_processing

.gitignore

.gitignore

Readme.md

Readme.md

pdf_downloader_script.py

pdf_downloader_script.py

pdf_processing_demo.py

pdf_processing_demo.py

requirements.txt

requirements.txt

Repository files navigation

IEEE Xplore PDF - JSON converter

Motivation and Objective

Min req: Python v3.5.2

About

Releases

Packages

Languages

sidmishraw/pdf_processor

Folders and files

Latest commit

History

Repository files navigation

IEEE Xplore PDF - JSON converter

Motivation and Objective

Min req: Python v3.5.2

About

Topics

Resources

Stars

Watchers

Forks

Languages