Skip to content

Creates a lookup table that shows on which pages each word occurs

License

Notifications You must be signed in to change notification settings

bertyhell/PdfPageLookup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PdfPageLookup

This windows executable allows you to select a pdf file and it will output each unique word in the pdf with every page that word occurs on to your clipboard.

Overview of how the app works

Example:

  • select a pdf file
  • the program will list all the words
  • next ot each word it shows all the pages that word occurs in
  • This list is copied to your clipboard, so you can paste it whereever you want

Removing noise words:

  • The zip contains a file: noisewords.txt
  • You can replace the content of this file with the words you want to omit from the list
  • A good source for words to remove in your language can be found here: http://www.ranks.nl/stopwords/

Technology

WPF and i'm using the pdf library IText