running pdfbox on node.js server
Right now I have a simple pass going. It can parse the text and output it to the console. Soon I would like the ability to process the pdf and gain understanding of the pdf, or do some analytics about the entities contained in the document. I would also like to categorize documents by what they are as well using KNN, and display graphs.
To run: First, execute
node index.js
then, navigate to localhost:8000
.
- upload a pdf to the server
- output will be displayed to the console