Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-implementation of open source methods in another language #16

Open
arokem opened this issue Apr 18, 2016 · 17 comments
Open

Re-implementation of open source methods in another language #16

arokem opened this issue Apr 18, 2016 · 17 comments

Comments

@arokem
Copy link

arokem commented Apr 18, 2016

Dear Rescience editors. In the course of our work, we have created a Python implementation of a method that was previously available as open-source R code. Is this implementation within the scope of Rescience? Thanks! cc:@kpolimis, @bhazelton

@rougier
Copy link
Member

rougier commented Apr 19, 2016

Do you have a reference paper to target for the replication ?

@arokem
Copy link
Author

arokem commented Apr 19, 2016

Yes. This is the paper: http://jmlr.csail.mit.edu/papers/volume15/wager14a/wager14a.pdf, and the previous implementation: https://github.com/swager/randomForestCI

@rougier
Copy link
Member

rougier commented Apr 19, 2016

It is okay as long as you do not make a simple "translation" of the R code. The idea of the replication is really to check if the original article is self-sufficient when describing method or model (i.e. without the accompanying code) or if some information is incorrect or missing. In the end, the original article + your article should be sufficient for future replications.

@khinsen What do you think ?

@FedericoV
Copy link

This seems like a better fit for a project like
http://contrib.scikit-learn.org/ to me.

On Tue, 19 Apr 2016 at 21:23 Nicolas P. Rougier notifications@github.com
wrote:

It is okay as long as you do not make a simple "translation" of the R
code. The idea of the replication is really to check if the original
article is self-sufficient when describing method or model (i.e. without
the accompanying code) or if some information is incorrect or missing. In
the end, the original article + your article should be sufficient for
future replications.

@khinsen https://github.com/khinsen What do you think ?


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#16 (comment)

@rougier
Copy link
Member

rougier commented Apr 19, 2016

Didn't know this. But anyway, you can do both actually (publication and contribution).

@khinsen
Copy link
Contributor

khinsen commented Apr 20, 2016

@rougier I agree: the main point of ReScience is doing replication in the sense of writing a new implementation that should produce results identical to published ones. If a published implementation already, we should ask for a "clean-room reimplementation" although this can of course not be verified.

In my personal experience, a second independent implementation is a great way to find mistakes (in both implementations), so I am tempted to suggest that we even encourage that kind of submission for ReScience.

@FedericoV
Copy link

There's an additional issue as well: most code in R is licensed GPL, while
most Python code is licensed MIT. If the code is a clean room
implementation, you can use MIT/BSD as a license, while if it is a
derivative work of the R code, you have to use it as GPL, which limits its
use within Python.

On Wed, 20 Apr 2016 at 17:57 Konrad Hinsen notifications@github.com wrote:

@rougier https://github.com/rougier I agree: the main point of
ReScience is doing replication in the sense of writing a new implementation
that should produce results identical to published ones. If a published
implementation already, we should ask for a "clean-room reimplementation"
although this can of course not be verified.

In my personal experience, a second independent implementation is a great
way to find mistakes (in both implementations), so I am tempted to
suggest that we even encourage that kind of submission for ReScience.


You are receiving this because you commented.

Reply to this email directly or view it on GitHub
#16 (comment)

@rougier
Copy link
Member

rougier commented Apr 20, 2016

Replication is not a derivative work for me.

@FedericoV
Copy link

I am not a lawyer, but I believe that if you look at GPL code while you
implement the Python code, it counts as derivative.

On Wed, 20 Apr 2016 at 18:02 Nicolas P. Rougier notifications@github.com
wrote:

Replication is not a derivative work for me.


You are receiving this because you commented.

Reply to this email directly or view it on GitHub
#16 (comment)

@arokem
Copy link
Author

arokem commented Apr 20, 2016

Might be generally of relevance, but not for this particular case. We did
not do a "clean room" implementation (IIUC what that would entail).
Instead, we looked at the original code, but in this case, it is under an
MIT license.

On Wed, Apr 20, 2016 at 9:04 AM, Federico Vaggi notifications@github.com
wrote:

I am not a lawyer, but I believe that if you look at GPL code while you
implement the Python code, it counts as derivative.

On Wed, 20 Apr 2016 at 18:02 Nicolas P. Rougier notifications@github.com
wrote:

Replication is not a derivative work for me.


You are receiving this because you commented.

Reply to this email directly or view it on GitHub
<
#16 (comment)


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#16 (comment)

@rougier
Copy link
Member

rougier commented Apr 20, 2016

You have of course the right to look at the code, but the idea is to start from the paper and to look at the code only if there is a missing piece of information in the paper or something remains obscure. Else, if the original author made a mistake, you could end up just translating that mistake in your code.

@oliviaguest
Copy link
Member

When you say mistake you mean a bug (of whatever type) as opposed to a mistake in the journal article, right?

@rougier
Copy link
Member

rougier commented Apr 20, 2016

No, I mean a mistake in the code in the sense that the code does not implement what is advertised in the paper. For example you can write you're integrating an equation using the Runge-Kutta numerical method while the code actually uses the explicit Euler methods. In some cases this won't make a difference, but in some other cases, this could lead to different results and hence, this must be reported in the new article.

@oliviaguest
Copy link
Member

I think we disagree on terminology, but not on the solution. If the implementation (code) doesn't match the specification (journal article), I would class that as a bug (a mistake in the code, specifically can be seen as a logic error) and as a mistake in the journal article.

@oliviaguest
Copy link
Member

oliviaguest commented Apr 20, 2016

To clarify, in case not clear from above, I do not mean that the presence of logic errors means there is a mistake in the journal article. There might be many logic errors without any mistakes in the article, merely because it does not matter in those specific cases that logical errors exist. But all mismatches between reported specification and implementation, directly require/imply a mistake in the journal article in the case where the journal article serves as the only spec.

@rougier
Copy link
Member

rougier commented Apr 20, 2016

I agree. This is precisely the goal of replication in ReScience: to spot such mistake (an also missing information) and to report them such that the two articles (original + replication) constitutes now a complete spec. For me the added value of replications in ReScience is more the article than the code.

For me, bug (or errors) are something different (and worse) because they can invalidate results. For example let's imagine you're using a fixed seed in your random generator (for debug) and you forgot to remove it before making stats using several runs of your model. This may very well invalidate all the results.

@oliviaguest
Copy link
Member

I think it's just terminology/jargon that we disagree on. Basically 100% agreed. 😄

pdebuyl pushed a commit to pdebuyl/ReScience-submission that referenced this issue Feb 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants