Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plagarism checker (NLTK part 2?) #135

Open
mridubhatnagar opened this issue Nov 14, 2017 · 3 comments
Open

Plagarism checker (NLTK part 2?) #135

mridubhatnagar opened this issue Nov 14, 2017 · 3 comments

Comments

@mridubhatnagar
Copy link
Contributor

mridubhatnagar commented Nov 14, 2017

  1. Converting word documents to pdf format.
  2. Merging of multiple pdfs to a single pdf.
  3. Plagarism checker
  4. Adding a word doc, excel file or any other file into a folder and convert it into zip folder.
  5. Extracting data out of pdf file. ( There was a talk in PyCon India 2017 related to this).
  6. Extract data from applications. Integrate with google docs. Visualize the data.

Some good use cases needed though.
Automation of daily tasks would be fun I guess.

@mridubhatnagar mridubhatnagar changed the title Automation of Tasks Automation of Tasks Challenge Nov 14, 2017
@bbelderbos
Copy link
Collaborator

bbelderbos commented Nov 14, 2017

There are some food opportunities there, thanks.

As 3/4 concern docs I will rename it to office tasks.

How would you go about 3.?

@pybites pybites changed the title Automation of Tasks Challenge Automation of Office Tasks Nov 14, 2017
@mridubhatnagar
Copy link
Contributor Author

hmm... Will have to think about it.
Actually while doing challenge-03 I was looking around for ways to find out similarity between words.
There I came across plagiarism checker. Maybe using NTLK module something can be done.
I guess percentage of similarity between 2 docs can be calculated.

@pybites pybites changed the title Automation of Office Tasks Plagarism checker (NLTK part 2?) Jan 8, 2019
@pybites
Copy link
Owner

pybites commented Jan 8, 2019

Focussing challenge idea around plagarism checker as we will tackle working with PDF files for PCC60

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants