Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plag checker #29

Open
pratt0007 opened this issue Oct 26, 2023 · 3 comments
Open

Plag checker #29

pratt0007 opened this issue Oct 26, 2023 · 3 comments

Comments

@pratt0007
Copy link

Every code that has been written by a user, together with the submission time, is stored in our database.
If this website provides a feature that allows us to determine how many times a specific code has been sent, or if that number surpasses a predetermined threshold, we may label it red to indicate that the code has been copied.

We can simply use Plagiarism detection techniques-

  1. Text-Based Plagiarism Detection - Some popular tools include Turnitin, Copyscape, and MOSS (Measure Of Software Similarity).
  2. Code Similarity Algorithms - Libraries like Simian and JPlag are examples.

ML Integration : Train machine learning models to identify code plagiarism. You can use techniques like natural language processing (NLP) and deep learning to analyze and compare code submissions.

@baoliay2008
Copy link
Owner

Hi, @pratt0007 Thank you very much for your suggestion. I genuinely appreciate your input.

I did consider this feature, but at this time, we don't have the capability to implement it as effectively as the LeetCode platform. For now, we don't save users' submission code; we only have their datetime, and the LeetCode platform has much more informative data, such as users' IP addresses (many cheating incidents happened in the same school or with someone using two accounts).

@pratt0007
Copy link
Author

Thank You so much @baoliay2008 going through my suggestions. I think we can scrape data from Leetcode by making a package in Python or something like that, and then, after having the database, we can apply some specific algorithms to check code plagiarism.

@Kaushik-sss
Copy link

@pratt0007 Aside form your solution to check plagiarism. I can think of one other way to check plagiarism you can scrap the user's submission then remove all spaces, comments, unused variables and functions(sounds complex as it is complex),remove print statements and other statements that don't contribute to solving the problem

  • Use hashing and hash each word in the code and produce a combined hash either by adding it into a list (Far easier)
  • use a text replacement for each solution and replace all variables and function names with a,b,c,d....z,aa,ab.. and so on just like how JavaScript compression and text obfuscator works and produce a modified code

use this to get the hash value using python's hash() use a dictionary and then check if already present or newly been found. The main problems that you will come across is identification of keywords in a particular language. Will be happy to work on it. Problems arise when a particular question has only a restricted method of solving it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants