Detecting generated scientific papers

Description

This competition is a part of the shared task hosted within the third workshop on Scholarly Document Processing (SDP 2022), being held in association with the 29th International Conference on Computational Linguistics (COLING 2022).

There are increasing reports that research papers can be written by computers, which presents a series of concerns (e.g., see [1]). In this challenge, we explore the state of the art in detecting automatically generated papers. We frame the detection problem as a binary classification task: given an excerpt of text, label it as either human-written or machine-generated. We provide a corpus of over 5000 excerpts from automatically written papers, based on the work by Cabanac et al. [2], as well as documents collected by Elsevier publishing and editorial teams. As a control set, we provide a 5x larger corpus of openly accessible human-written as well as generated papers from the same scientific domains of documents. We also encourage contributions that aim to extend this dataset with other computer-generated scientific papers, or papers that propose valid metrics to assess automatically generated papers against those written by humans.

Acknowledgements

We thank Cyril Labbé, Basile Dubois-Binnaire, Guillaume Cabanac, and Alexander Magazinov for their input in the ideation phase of the task preparation.

Links

[1] Holly Else. (2021). "'Tortured phrases' give away fabricated research papers." Nature.

[2] Guillaume Cabanac, Cyril Labbé, and Alexander Magazinov. (2021). "Tortured phrases: A dubious writing style emerging in science. Evidence of critical issues affecting established journals."

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
generated-sci-papers-baseline-bert-tf-keras.ipynb		generated-sci-papers-baseline-bert-tf-keras.ipynb
generated-sci-papers-bert-keras-cv.ipynb		generated-sci-papers-bert-keras-cv.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

generated-sci-papers-baseline-bert-tf-keras.ipynb

generated-sci-papers-baseline-bert-tf-keras.ipynb

generated-sci-papers-bert-keras-cv.ipynb

generated-sci-papers-bert-keras-cv.ipynb

Repository files navigation

Detecting generated scientific papers

Description

Acknowledgements

Links

About

Releases

Packages

Languages

License

bchryzal/Detecting-Generated-Scientific-Papers

Folders and files

Latest commit

History

Repository files navigation

Detecting generated scientific papers

Description

Acknowledgements

Links

About

Topics

Resources

License

Stars

Watchers

Forks

Languages