GitHub - noahpryor/pdflib: (jruby) extract text and tables from PDFs using Mozilla's tabula-extract or just straightup OCR.

	#install jruby
	bundle install
	#run with example schema
	bundle exec jruby page_extractor.rb

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.bundle		.bundle
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
LICENSE.md		LICENSE.md
README.md		README.md
page_extractor.rb		page_extractor.rb
pdf_extract.rb		pdf_extract.rb
pdf_spec.rb		pdf_spec.rb
settings.rb		settings.rb
temp-file.pdf		temp-file.pdf
temp-file_1.png		temp-file_1.png

Provide feedback