Paper-Analysis

External repos used

NLTK
pdfminer https://pypi.python.org/packages/source/p/pdfminer/pdfminer-20131113.tar.gz

###Description: The main aim of this project was to find out the most frequently repeated topics in the previous year exam papers which might take some time.

Deliverables:This project is still incomplete as i am waiting for the scanned copies of the previous year papers of the university under which i am doing my B.E.

###Things done

Scans the pdf files and extracts text data from them.
Tokenzes the text files into words.
Deletes irrelevant words from the corpus like determines verbs etc
Gives the frequency distribution of the most frequent words in the sample document

###Things to do

Get digitized copies of previous year question papers.
Make a list of all the topics in the syllabus.
Cross check if the tokenized list contains the topics in the syllabus and plot a frequency distribution table accordingly
plot the results in a graph.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Paper-Analysis

External repos used

Files

README.md

Latest commit

History

README.md

File metadata and controls

Paper-Analysis

External repos used