[WIP] Decoding from the categorical posterior distribution #38

maxrmorrison · 2019-09-01T00:49:54Z

Implements the proposed alternative to posterior decoding of the network output as suggested in #37.

Remaining tasks:

Test on original data. My tests have all been conducted on speech data. This proposed alternative should be run on the same data as the original continuous decoding for fair comparison.
I took the phrasing "a PR that replaces hmmlearn with librosa.sequence.viterbi" literally—so the hmmlearn decoding has been removed. Based on the performance comparison, we may want to allow both decoding options.

…rom a categorical distribution.

justinsalamon · 2019-09-01T19:58:04Z

@maxrmorrison would it be possible to carve out the relevant viterbi code only, so as to avoid having the entirety of librosa as a (heavy) dependency?

jongwook · 2019-09-01T22:27:57Z

I wouldn't mind adding librosa as a dependency

justinsalamon · 2019-09-02T14:07:16Z

@jongwook what's your justification? The greater the dependency chain, the greater the technical debt (=more maintenance), and the greater the chances of users running into issues during installation. It also inflates the storage space requires and limits deployment on less standard/embedded platforms if some dependencies don't support these.

It seems like all the sequence modelling code is concentrated in one source file, why not extract only what we need?

maxrmorrison · 2019-09-02T18:47:53Z

@justinsalamon we can further narrow the dependency requirement to just two functions (librosa.sequence.viterbi and librosa.sequence._viterbi) and a custom error class (librosa.util.exceptions.ParameterError). Those 3 have no other dependencies within librosa. I would be okay with extracting them. Where would be the best place to give the proper attribution that those functions and exceptions had come from librosa?

jongwook · 2019-09-02T23:22:40Z

@justinsalamon Justifications:

Since we already depend on resampy, I don't think adding librosa would be a substantial roadblock anywhere especially when we require the whole TensorFlow and Keras too. TensorFlow itself is ~500MB while a librosa install is just 2.6 MB. In comparison, crepe takes up ~160 MB because of the pretrained weights.

In less standard/embedded platforms, it'll require special treatment of dealing with TensorFlow anyway, and I don't think there is a platform where librosa can't run while TensorFlow runs without modification.

Since librosa is a de-facto standard library of audio analysis in Python, I expect many "customers" of crepe would already have librosa.

In terms of maintenance, by duplicating librosa code we take the responsibility of backporting any future issues of librosa's Viterbi code. So it's debatable which creates more "technical debt".

justinsalamon · 2019-09-05T00:28:46Z

I'd be in favor of carving out only the needed functionality, but @jongwook it's your final call.

jongwook · 2019-09-05T01:01:28Z

It appears that llvmlite inherited via numba sometimes can be an issue on ARM and Windows hosts.

I personally had no problem installing librosa on Windows and a Raspberry Pi, so I suspect those issues are isolated ones. Meanwhile, we rely on numba via resampy anyway, and easy alternatives like Scipy's resampling methods have many drawbacks.

Also, because of my intentional choice of not using librosa, we used scipy.wavfile and supported WAV files only because of that. So using librosa we can just use librosa.load(file, sr=16000) to support a wide variety of file formats.

So I'd like to keep the current PR which calls librosa.sequence.viterbi, to keep the separation of concerns and avoid code duplication. Sorry about putting on the dictator hat @justinsalamon !

@maxrmorrison Thanks for the PR! The code looks good to me, but let me confirm that it definitely improves on the RWC-synth dataset compared to hmmlearn.

Replaced hmmlearn continuous decoding with librosa viterbi decoding f…

83c5a12

…rom a categorical distribution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Decoding from the categorical posterior distribution #38

[WIP] Decoding from the categorical posterior distribution #38

maxrmorrison commented Sep 1, 2019

justinsalamon commented Sep 1, 2019

jongwook commented Sep 1, 2019

justinsalamon commented Sep 2, 2019

maxrmorrison commented Sep 2, 2019

jongwook commented Sep 2, 2019

justinsalamon commented Sep 5, 2019

jongwook commented Sep 5, 2019 •

edited

[WIP] Decoding from the categorical posterior distribution #38

Are you sure you want to change the base?

[WIP] Decoding from the categorical posterior distribution #38

Conversation

maxrmorrison commented Sep 1, 2019

justinsalamon commented Sep 1, 2019

jongwook commented Sep 1, 2019

justinsalamon commented Sep 2, 2019

maxrmorrison commented Sep 2, 2019

jongwook commented Sep 2, 2019

justinsalamon commented Sep 5, 2019

jongwook commented Sep 5, 2019 • edited

jongwook commented Sep 5, 2019 •

edited