Re-implementation of open source methods in another language #16

arokem · 2016-04-18T21:41:36Z

Dear Rescience editors. In the course of our work, we have created a Python implementation of a method that was previously available as open-source R code. Is this implementation within the scope of Rescience? Thanks! cc:@kpolimis, @bhazelton

rougier · 2016-04-19T06:27:12Z

Do you have a reference paper to target for the replication ?

arokem · 2016-04-19T14:11:54Z

Yes. This is the paper: http://jmlr.csail.mit.edu/papers/volume15/wager14a/wager14a.pdf, and the previous implementation: https://github.com/swager/randomForestCI

rougier · 2016-04-19T19:23:22Z

It is okay as long as you do not make a simple "translation" of the R code. The idea of the replication is really to check if the original article is self-sufficient when describing method or model (i.e. without the accompanying code) or if some information is incorrect or missing. In the end, the original article + your article should be sufficient for future replications.

@khinsen What do you think ?

FedericoV · 2016-04-19T19:26:57Z

This seems like a better fit for a project like
http://contrib.scikit-learn.org/ to me.

On Tue, 19 Apr 2016 at 21:23 Nicolas P. Rougier notifications@github.com
wrote:

It is okay as long as you do not make a simple "translation" of the R
code. The idea of the replication is really to check if the original
article is self-sufficient when describing method or model (i.e. without
the accompanying code) or if some information is incorrect or missing. In
the end, the original article + your article should be sufficient for
future replications.

@khinsen https://github.com/khinsen What do you think ?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#16 (comment)

rougier · 2016-04-19T19:34:12Z

Didn't know this. But anyway, you can do both actually (publication and contribution).

khinsen · 2016-04-20T15:57:44Z

@rougier I agree: the main point of ReScience is doing replication in the sense of writing a new implementation that should produce results identical to published ones. If a published implementation already, we should ask for a "clean-room reimplementation" although this can of course not be verified.

In my personal experience, a second independent implementation is a great way to find mistakes (in both implementations), so I am tempted to suggest that we even encourage that kind of submission for ReScience.

FedericoV · 2016-04-20T16:01:32Z

There's an additional issue as well: most code in R is licensed GPL, while
most Python code is licensed MIT. If the code is a clean room
implementation, you can use MIT/BSD as a license, while if it is a
derivative work of the R code, you have to use it as GPL, which limits its
use within Python.

On Wed, 20 Apr 2016 at 17:57 Konrad Hinsen notifications@github.com wrote:

@rougier https://github.com/rougier I agree: the main point of
ReScience is doing replication in the sense of writing a new implementation
that should produce results identical to published ones. If a published
implementation already, we should ask for a "clean-room reimplementation"
although this can of course not be verified.

In my personal experience, a second independent implementation is a great
way to find mistakes (in both implementations), so I am tempted to
suggest that we even encourage that kind of submission for ReScience.

—
You are receiving this because you commented.

Reply to this email directly or view it on GitHub
#16 (comment)

rougier · 2016-04-20T16:02:37Z

Replication is not a derivative work for me.

FedericoV · 2016-04-20T16:04:22Z

I am not a lawyer, but I believe that if you look at GPL code while you
implement the Python code, it counts as derivative.

On Wed, 20 Apr 2016 at 18:02 Nicolas P. Rougier notifications@github.com
wrote:

Replication is not a derivative work for me.

—
You are receiving this because you commented.

Reply to this email directly or view it on GitHub
#16 (comment)

arokem · 2016-04-20T16:06:17Z

Might be generally of relevance, but not for this particular case. We did
not do a "clean room" implementation (IIUC what that would entail).
Instead, we looked at the original code, but in this case, it is under an
MIT license.

On Wed, Apr 20, 2016 at 9:04 AM, Federico Vaggi notifications@github.com
wrote:

I am not a lawyer, but I believe that if you look at GPL code while you
implement the Python code, it counts as derivative.

On Wed, 20 Apr 2016 at 18:02 Nicolas P. Rougier notifications@github.com
wrote:

Replication is not a derivative work for me.

—
You are receiving this because you commented.

Reply to this email directly or view it on GitHub
<
#16 (comment)

—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#16 (comment)

rougier · 2016-04-20T16:19:57Z

You have of course the right to look at the code, but the idea is to start from the paper and to look at the code only if there is a missing piece of information in the paper or something remains obscure. Else, if the original author made a mistake, you could end up just translating that mistake in your code.

oliviaguest · 2016-04-20T17:13:39Z

When you say mistake you mean a bug (of whatever type) as opposed to a mistake in the journal article, right?

rougier · 2016-04-20T17:57:04Z

No, I mean a mistake in the code in the sense that the code does not implement what is advertised in the paper. For example you can write you're integrating an equation using the Runge-Kutta numerical method while the code actually uses the explicit Euler methods. In some cases this won't make a difference, but in some other cases, this could lead to different results and hence, this must be reported in the new article.

oliviaguest · 2016-04-20T18:53:29Z

I think we disagree on terminology, but not on the solution. If the implementation (code) doesn't match the specification (journal article), I would class that as a bug (a mistake in the code, specifically can be seen as a logic error) and as a mistake in the journal article.

oliviaguest · 2016-04-20T19:00:55Z

To clarify, in case not clear from above, I do not mean that the presence of logic errors means there is a mistake in the journal article. There might be many logic errors without any mistakes in the article, merely because it does not matter in those specific cases that logical errors exist. But all mismatches between reported specification and implementation, directly require/imply a mistake in the journal article in the case where the journal article serves as the only spec.

rougier · 2016-04-20T20:12:19Z

I agree. This is precisely the goal of replication in ReScience: to spot such mistake (an also missing information) and to report them such that the two articles (original + replication) constitutes now a complete spec. For me the added value of replications in ReScience is more the article than the code.

For me, bug (or errors) are something different (and worse) because they can invalidate results. For example let's imagine you're using a fixed seed in your random generator (for debug) and you forgot to remove it before making stats using several runs of your model. This may very well invalidate all the results.

oliviaguest · 2016-04-20T20:15:26Z

I think it's just terminology/jargon that we disagree on. Basically 100% agreed. 😄

Fig. 3a) and 3b)

pdebuyl pushed a commit to pdebuyl/ReScience-submission that referenced this issue Feb 26, 2019

Merge pull request ReScience#16 from BIO6032/code_Fig3

8361672

Fig. 3a) and 3b)

cJarvers mentioned this issue Aug 12, 2019

Review Request : R. Larisch #57

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-implementation of open source methods in another language #16

Re-implementation of open source methods in another language #16

arokem commented Apr 18, 2016

rougier commented Apr 19, 2016

arokem commented Apr 19, 2016

rougier commented Apr 19, 2016

FedericoV commented Apr 19, 2016

rougier commented Apr 19, 2016

khinsen commented Apr 20, 2016

FedericoV commented Apr 20, 2016

rougier commented Apr 20, 2016

FedericoV commented Apr 20, 2016

arokem commented Apr 20, 2016

rougier commented Apr 20, 2016

oliviaguest commented Apr 20, 2016

rougier commented Apr 20, 2016

oliviaguest commented Apr 20, 2016

oliviaguest commented Apr 20, 2016 •

edited

rougier commented Apr 20, 2016

oliviaguest commented Apr 20, 2016

Re-implementation of open source methods in another language #16

Re-implementation of open source methods in another language #16

Comments

arokem commented Apr 18, 2016

rougier commented Apr 19, 2016

arokem commented Apr 19, 2016

rougier commented Apr 19, 2016

FedericoV commented Apr 19, 2016

rougier commented Apr 19, 2016

khinsen commented Apr 20, 2016

FedericoV commented Apr 20, 2016

rougier commented Apr 20, 2016

FedericoV commented Apr 20, 2016

arokem commented Apr 20, 2016

rougier commented Apr 20, 2016

oliviaguest commented Apr 20, 2016

rougier commented Apr 20, 2016

oliviaguest commented Apr 20, 2016

oliviaguest commented Apr 20, 2016 • edited

rougier commented Apr 20, 2016

oliviaguest commented Apr 20, 2016

oliviaguest commented Apr 20, 2016 •

edited