Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Intel MKL is not Free Software #47

Open
p-e-w opened this issue Feb 26, 2021 · 8 comments
Open

The Intel MKL is not Free Software #47

p-e-w opened this issue Feb 26, 2021 · 8 comments

Comments

@p-e-w
Copy link

p-e-w commented Feb 26, 2021

Hey there, I was really excited to finally see a high-quality Kaldi distribution that can be pip installed. Unfortunately, the Intel Math Kernel Library required to build KaldiAG is not Free Software:

  • Its license, the "Intel Simplified Software License", is not OSI approved
  • The Apache Software Foundation has outright rejected the ISSL "because it is not an open source license"
  • The Kaldi documentation itself states that the MKL "is still a closed-source commercial product"

Therefore, building KaldiAG currently requires nonfree software, and it is questionable whether the binary wheels can even be distributed under the AGPL, as claimed on PyPI.

It would be great if it was possible to build Free Software using KaldiAG. I therefore recommend switching the build process to link against OpenBLAS instead of the MKL by default, especially for the wheels so they can be used in Free Software.

Disclaimer: I am not a lawyer and this is not legal advice.

@daanzu
Copy link
Owner

daanzu commented Feb 28, 2021

You raise a good point. I'm certainly no license expert. Unfortunately, MKL has a significant performance benefit, last I checked. However, I should at least have builds and wheels available using OpenBLAS (which is easier than with MKL anyway). I'll try to do so soon.

@p-e-w
Copy link
Author

p-e-w commented Mar 1, 2021

The MKL has a significant performance benefit on Intel CPUs. On AMD processors, its performance is terrible, and far worse than OpenBLAS. This has been a known issue for years and is apparently intentional. When a workaround was discovered to force MKL to use fast code paths on AMD as well, Intel promptly updated the MKL to disable that workaround, and as of today there is once again no publicly known method to get reasonable performance on AMD CPUs using MKL.

So with MKL, you get good performance on Intel, and terrible performance on AMD. With OpenBLAS, you get acceptable performance on both. This alone might be reason enough to switch, especially considering how much more popular AMD has become recently.

But performance really isn't the main concern here. There is plenty of nonfree ASR software available that wipes the floor with Kaldi when it comes to speed and accuracy. The primary attraction with a package like this will always be the free software aspect, so depending on nonfree software is a huge problem regardless of any benefits it may bring.

@daanzu
Copy link
Owner

daanzu commented Mar 2, 2021

Thanks for bringing the issue of performance on AMD to my attention! Devious, but I suppose not that surprising for Intel. That certainly is reason enough for at least making OpenBLAS the default. I should hopefully be able to make the change for the impending 2.0 release.

@shervinemami
Copy link

shervinemami commented Mar 2, 2021 via email

@p-e-w
Copy link
Author

p-e-w commented Mar 10, 2021

Sounds great indeed!

Somewhat related, could you please clarify the license for the models? The README states that "this project" is licensed under the AGPL, but the model files are only provided as release artifacts, so it is not obvious that they are covered by the same license. Also, the AGPL is a rather ill-fitting license for data files, since most of its terms deal with "source code" and it's not clear what that means when applied to a language model.

@daanzu
Copy link
Owner

daanzu commented Mar 16, 2021

@p-e-w Regarding the licensing of the models, they are currently packaged with a license file for the AGPL. I agree that the license doesn't really fit using it with speech models, but I am using it in an attempt to convey the "spirit" of my intent. The licensing situation for speech models seems to be somewhat new and not fully developed, and I'm not sure that any of the major open licenses fit very well, but I'm no expert and am open to suggestions/input.

@p-e-w
Copy link
Author

p-e-w commented Mar 22, 2021

@daanzu The new release looks great! This issue can probably be closed, unless you want to keep it open to track Windows support for OpenBLAS. I assume OpenBLAS is now also used in the (Linux/macOS) wheels?

I must say I'm very impressed with how well KaldiAG works overall. I've been experimenting with ASR software for years, and KaldiAG is the first Open Source option that provides recognition accuracy that I would consider good enough for serious applications. I don't know if it's the FSTs that guide Kaldi towards more sensible recognition pathways, or the quality of the models, or a combination of both, but somehow KaldiAG ends up leaving the competition in the dust according to my (informal) tests. Congratulations for this impressive achievement!

Regarding the licensing of the models, they are currently packaged with a license file for the AGPL.

I managed to overlook that somehow. Thanks for pointing it out!

The licensing situation for speech models seems to be somewhat new and not fully developed, and I'm not sure that any of the major open licenses fit very well, but I'm no expert and am open to suggestions/input.

GitHub maintains the website choosealicense.com to help with such decisions. It has a dedicated page about non-software licenses, which appears to recommend CC0-1.0, CC-BY-4.0, and CC-BY-SA-4.0 for datasets. I've definitely seen datasets for various applications being released under Creative Commons licenses, but whether or not they are an appropriate choice for speech models I cannot say with any certainty.

@daanzu
Copy link
Owner

daanzu commented Mar 23, 2021

@p-e-w Yep, no MKL anymore in the Linux/Mac wheels. But I will keep this open for the change on Windows, which I didn't want to delay the release.

I am glad it is working so well for you! Yes, Kaldi does a great job of providing good accuracy combined with the ability to adjust the various parts like lexicon/LM/grammars to fit your domain, which can really boost the accuracy. And thankfully someone has lent me access to a couple of GPUs to train open source models with.

Thanks for the links. I will take a look, and see if I can find a more appropriate license.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants