How to deal with the copyright issue? #24

shizhediao · 2020-08-11T15:10:47Z

Hi,
Thanks for your great dataset which definitely speeds up scientific research!
As a fan and user of your dataset, I was really curious how do you guys deal with copyright issues?

Do you have the right to distribute the submitted articles?
As a user of the dataset, may I have the redistribution right? For example, if I do another process step designed for some research tasks based on your dataset, could I distribute it to other people?
Thanks!

kyleclo · 2020-11-13T15:01:05Z

Hi @shizhediao, we already discussed this over email; just copying my response here for others:

Copyright is pretty tricky! We consulted with a lawyer about this for a long time, and ultimately decided that releasing this under CC BY-NC 2.0 https://github.com/allenai/s2orc/blob/master/README.md#license is safe. There are a variety of factors in our favor here: We're only releasing full text data that's derived from open-access papers. We're only allowing S2ORC for non-commercial use. And the S2ORC text isn't really usable for direct consumption of the papers (i.e. reading the paper like a PDF) and doesn't contain a lot of the content necessary to read the paper (e.g. visual layout, figures, etc.), so can likely argue that this falls under fair use for research.

Please take a look at the license which should explain what you can/can't do with S2ORC & derivations with respect to redistribution. In short, yes, what we're hoping for is researchers will use S2ORC as a "meta" corpus to derive further task-specific NLP datasets that they can distribute.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to deal with the copyright issue? #24

How to deal with the copyright issue? #24

shizhediao commented Aug 11, 2020

kyleclo commented Nov 13, 2020

How to deal with the copyright issue? #24

How to deal with the copyright issue? #24

Comments

shizhediao commented Aug 11, 2020

kyleclo commented Nov 13, 2020