Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to deal with the copyright issue? #24

Open
shizhediao opened this issue Aug 11, 2020 · 1 comment
Open

How to deal with the copyright issue? #24

shizhediao opened this issue Aug 11, 2020 · 1 comment

Comments

@shizhediao
Copy link

Hi,
Thanks for your great dataset which definitely speeds up scientific research!
As a fan and user of your dataset, I was really curious how do you guys deal with copyright issues?

  1. Do you have the right to distribute the submitted articles?
  2. As a user of the dataset, may I have the redistribution right? For example, if I do another process step designed for some research tasks based on your dataset, could I distribute it to other people?
    Thanks!
@kyleclo
Copy link
Collaborator

kyleclo commented Nov 13, 2020

Hi @shizhediao, we already discussed this over email; just copying my response here for others:

Copyright is pretty tricky! We consulted with a lawyer about this for a long time, and ultimately decided that releasing this under CC BY-NC 2.0 https://github.com/allenai/s2orc/blob/master/README.md#license is safe. There are a variety of factors in our favor here: We're only releasing full text data that's derived from open-access papers. We're only allowing S2ORC for non-commercial use. And the S2ORC text isn't really usable for direct consumption of the papers (i.e. reading the paper like a PDF) and doesn't contain a lot of the content necessary to read the paper (e.g. visual layout, figures, etc.), so can likely argue that this falls under fair use for research.

Please take a look at the license which should explain what you can/can't do with S2ORC & derivations with respect to redistribution. In short, yes, what we're hoping for is researchers will use S2ORC as a "meta" corpus to derive further task-specific NLP datasets that they can distribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants