Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Releasing RNNoise version 0.2 #221

Open
jmvalin opened this issue Apr 15, 2024 · 15 comments
Open

Releasing RNNoise version 0.2 #221

jmvalin opened this issue Apr 15, 2024 · 15 comments

Comments

@jmvalin
Copy link
Member

jmvalin commented Apr 15, 2024

This new release brings many improvements, including improved training, SSE4.1 and AVX2 optimizations, and run-time CPU detection (RTCD).

The distributed models are now trained using only publicly available datasets.

@petterreinholdtsen
Copy link
Contributor

petterreinholdtsen commented Apr 15, 2024 via email

@DarthPJB
Copy link

This release expects the build process to execute a arbitrary-shellscript to download an arbitrary file as part of the build process; outside of cmake, in a potentially privileged environment.

If I can find the time, I will PR fix for your Cmake or alike; I'm sorry I'm not doing it now - because it's very exciting to me this project is moving again.

But as it stands, this is a pretty big hole for practical secure CI-pipelines and reliability in isolated or secure environments.
In short - do not do this.

THAT SAID:

I'm honestly glad this project is getting some much needed attention; but please remember that it's been relied upon for various uses, on various platforms for a very long time now.

If your not considering the need to build this as a standalone Risc-V library for a non-linux operating-system; You should be.
It may need to be built without internet access.
And on windows XP; to support cable TV stations using it with Gstreamer.

And many, many more that I'm aware of.

Keep up the good work, do good things, and have pride - just don't forget the use-cases beyond your own. ;)

@petterreinholdtsen
Copy link
Contributor

petterreinholdtsen commented Apr 16, 2024 via email

@jmvalin
Copy link
Member Author

jmvalin commented Apr 22, 2024

So there's two things here: one is the download script and the other is the actual content being fetched. In terms of the script itself, it's fairly simple to audit and see that it's not doing anything bad. It's also in git and signed, so I don't really see a trust issue compared to trusting the rest of the source code.

When it comes the the actual model data being fetched, I can see how it's not as safe as the rest of the Git content (sha1 and signature). One possible way to solve that would be the equivalent to what was just added to Opus:
xiph/opus@9faf6f071

The idea Is that Git will hold the sha256 of the downloaded content and thus it can check that the checksums actually match before using it. That way compromising the server will just result in a build failure.

@DarthPJB
Copy link

The idea Is that Git will hold the sha256 of the downloaded content and thus it can check that the checksums actually match before using it. That way compromising the server will just result in a build failure.

This is without question an acceptable solution - I come from a version of linux called NixOS; our package manager does this for all external downloads as part of the build step; ( which is how I came to be here: NixOS/nixpkgs#304433 )

The shellscript itself is an acceptable compromise if an external download is absolutely required; in my personal ideal external data would be fetched by git submodule (providing a solid revision control), or the dependency treated as such.

It seems the model for RNNoise has always been a dependency;

@petterreinholdtsen
Copy link
Contributor

petterreinholdtsen commented Apr 23, 2024 via email

@jmvalin
Copy link
Member Author

jmvalin commented Apr 23, 2024

@petterreinholdtsen Do you have alternatives to suggest for distributing a 20 MB binary file? In version 0.1, the model was much smaller (and it still wasn't great) and didn't include the binary model (which is the preferred form for making modifications in that case).

@petterreinholdtsen
Copy link
Contributor

petterreinholdtsen commented Apr 23, 2024 via email

@tdaede
Copy link
Collaborator

tdaede commented Apr 24, 2024

All package managers I know of have the ability to specify multiple source packages - I'd expect them to specify the model data as one of the sources. The nixpkgs approach seems good.

Having a hash of the data in git should be enough for revision control to ensure that it has the same version of the model as the code was designed for. Submodules remain tricky to use, especially because the model is large and most people would want a shallow clone of it.

@jmvalin
Copy link
Member Author

jmvalin commented May 20, 2024

OK, the download process should be safer now as the download script will verify the sha256 before using the downloaded binary file.

@petterreinholdtsen
Copy link
Contributor

petterreinholdtsen commented May 20, 2024 via email

@DarthPJB
Copy link

[Thomas Daede]
[SNIP]
Note, for release tarballs, such approach will break the deserted island test of free software.
[SNIP]
-- Happy hacking Petter Reinholdtsen

I have to agree with this statement - although one would hope that the dependency will find it's way onto a debian disk too ;)

@jmvalin
Copy link
Member Author

jmvalin commented May 20, 2024

Unless I missed something, the RNNoise 0.2 passes the desert island test fine. The download mechanism only applies to Git.

@petterreinholdtsen
Copy link
Contributor

petterreinholdtsen commented May 20, 2024 via email

@tdaede
Copy link
Collaborator

tdaede commented May 21, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants