Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the Moses Tokenizer in violation of it's license? #2000

Closed
oxinabox opened this issue Apr 10, 2018 · 9 comments
Closed

Is the Moses Tokenizer in violation of it's license? #2000

oxinabox opened this issue Apr 10, 2018 · 9 comments
Assignees
Labels

Comments

@oxinabox
Copy link
Contributor

I was looking through the tokenizers,
and I spotted Moses Tokenizer

https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl

which is ported from the perl script:
https://github.com/moses-smt/mosesdecoder/blob/ae7aa6a9d25be49ab4c15ec68515e74490af399b/scripts/tokenizer/tokenizer.perl#L3-L4

Which currently says that it is LGPL 2.1+

As I understand it one can not incorporate LGPL 2 or 3 into Apache 2.
but I am not 100% (if it were full GPL I know you can't incorporate that)

@oxinabox oxinabox changed the title Is the Moses Tokenizer n violation of it's license? Is the Moses Tokenizer in violation of it's license? Apr 10, 2018
@alvations
Copy link
Contributor

alvations commented Apr 10, 2018

Yes, it is in violation, sadly.

And it's not possible to get permission from Moses maintainers. https://www.mail-archive.com/moses-support@mit.edu/msg15864.html

@noe
Copy link

noe commented Apr 10, 2018

LGPL has linking exception:

  1. A program that contains no derivative of any portion of the
    Library, but is designed to work with the Library by being compiled or
    linked with it, is called a "work that uses the Library". Such a
    work, in isolation, is not a derivative work of the Library, and
    therefore falls outside the scope of this License.

Why not separating the moses-derived works into a new package (e.g. nltk.moses) licensed under LGPL and let everyone, including Marian, use it without propagating the LGPL?

@stevenbird
Copy link
Member

@noe: nice idea - I'd be happy to consider a PR

@noe
Copy link

noe commented Apr 12, 2018

@stevenbird I'd be glad to contribute. From the moses mailing list I understand that @alvations may contact the authors to request their permission to have the derived code as Apache, and hence not needing to modify anything in the NLTK code. @alvations is that correct?

If that attempt does not succeed: as new repo's cannot be subject to PR's, we can either have NLTK create a new empty repo to which I would PR or, alternatively, I can create a new repo with the LGPL code and give its ownership to nltk.

@stevenbird
Copy link
Member

Thanks @noe. Yes, let's wait to hear if @alvations has any luck.

@stevenbird
Copy link
Member

Oh, looks like the answer is already no.

@alvations
Copy link
Contributor

alvations commented Apr 18, 2018

Sorry for being away for a while. The short answer is no until we get everyone to agree on Moses side (which is hard unless we're at WMT and MTM to just ask almost everyone for their permissions when they're physically there).

The best solution is to have some sort of LGPL repo for nltk_contrib. Now all LGPL code goes there and then we do a git submodule add .

Additionally on Moses side, lets see how far we can push them in terms of creating a wholly independent module to keep their Python code so that we can import them as dependencies from PyPI. (This will take some work though).

@alvations
Copy link
Contributor

I've repackaged MoseTokenizer as a separate library and I think we can either wrap around it in NLTK or add a deprecation message and ask users to use the new package. https://github.com/alvations/sacremoses

Should I transfer the ownership to NLTK organization on github? Not sure how the Moses community feels about this, let me try to ask them first.

Note: The SacreMoses was just a continuation of the SacreBLEU chain of tools coming out of the exodus of Moses scripts.

@stevenbird
Copy link
Member

Resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants