Add Multiresolution HuBERT as an additional upstream model #517

ftshijt · 2023-11-28T10:17:38Z

As per discussion in #515 ,this PR adds the multiresolution HuBERT as an additional upstream model.

TODOs:

docs
upload converted ckpt to huggingface (as well as related entry in hubconf.py)

Reference PR in fairseq:

Btw, during the evaluation of vc task, also fixed a minor bug related to new updates in librosa

Minor fix to s3prl-vc to new librosa versions

…t -- profile black)

ftshijt · 2023-12-08T14:20:55Z

@leo19941227 Hi Leo, the PR is ready for review now (I also fix the huggingface repo for a range of pre-trained models). I will continue to add corresponding docs to the documentation page.

leo19941227 · 2023-12-08T21:07:38Z

Hi @ftshijt !

Sure, I will review the changes this weekend. Thanks so much!

leo19941227 · 2023-12-10T17:33:34Z

docs/source/tutorial/upstream_collection.rst

+multires_hubert_base
+~~~~~~~~~~~~~~~~~~~~~
+
+- Unlabled Speech: LibriSpeech 960hr
+- K-means extracted from `hubert_base`_
+
+
+multires_hubert_large
+~~~~~~~~~~~~~~~~~~~~~
+
+- Unlabeled Speech: LibriLight 60khr
+- K-means extracted from `hubert_base`_
+
+
+multires_hubert_multilingual_base
+---------------------------------
+
+- Unlabeled Speech: Voxpopuli 100khr
+- K-means extracted from `hubert_base`_
+
+
+multires_hubert_multilingual_large400k
+--------------------------------------
+
+- Unlabeled Speech: Voxpopuli 100khr
+- K-means extracted from `hubert_base`_
+- Training steps 400k
+
+
+multires_hubert_multilingual_large600k
+--------------------------------------
+
+- Unlabeled Speech: Voxpopuli 100khr
+- K-means extracted from `hubert_base`_
+- Training steps 600k
+
+


For the K-means extraction source, the current document implies that all the models are using the K-means from the 2nd iteration of HuBERT Base, could you help double check if this is true?
For example, this means multires_hubert_base is the 3rd iteration of the HuBERT training, which might not be directly comparable to hubert_base.

Yeah, the current note is true. All the provided models shall be considered as the 3rd iteration of the HuBERT-training (not directly comparable to hubert_base). In our provided paper, we also conducted the experiments with the "comparable" hubert using the same 3rd iteration k-means.

In that case, do you suggest me submit those models (3rd iteration hubert) to the s3prl as well?

Oh I think adding the 3rd iteration of HuBERT is not necessary. I was just curious and making sure that the note is correct. I have no further issue then. (You can still submit the 3rd iteration HuBERT if you want)

s3prl/downstream/a2o-vc-vcc2020/utils.py

s3prl/upstream/multires_hubert/expert.py

leo19941227

Thanks @ftshijt !
I have a few comments, please help check it.
Thanks!

ftshijt · 2023-12-11T03:04:12Z

Thanks @ftshijt ! I have a few comments, please help check it. Thanks!

Many thanks for the review! I've removed the unused legacy note, but I'm not sure if we would also like to add the 3rd iteration hubert-base as discussed above. But I feel even in that case, I can probably simply add that to the s3prl/hubert huggingface repo and leave it there for people who are interested in a fair comparison between hubert and mr-hubert (let me know if you have any other better ideas~).

leo19941227 · 2023-12-11T03:19:28Z

Hi @ftshijt ,

I am good with the current changes and I am ready to merge this.
Let me know if you are ready or you plan to submit the 3rd iteration HuBERT in this PR.

ftshijt · 2023-12-11T03:21:39Z

Hi @ftshijt ,

I am good with the current changes and I am ready to merge this. Let me know if you are ready or you plan to submit the 3rd iteration HuBERT in this PR.

Thanks again for the review. I would prefer to merge this PR. For the 3rd iteration hubert, I will commit that to the huggingface repo directly for reference purposes.

leo19941227 · 2023-12-11T03:29:01Z

Sounds good!

ftshijt and others added 13 commits August 22, 2023 13:15

Update utils.py

2c9996f

Merge pull request #2 from ftshijt/ftshijt-patch-1

ce01f00

Minor fix to s3prl-vc to new librosa versions

Merge branch 's3prl:main' into main

2d32b17

add initial multires_hubert implementation

eb808c6

apply black and isort

7c6c64b

revert isort for black compatibility

250b05d

update black version

56dbf0d

fmt off for convert due to mismatched rule for black and isort

c941e8a

fix hubert_model with fmt off as well

0eb353f

disable isort for the two files (a mismatched rule for isort and isor…

26a7aa7

…t -- profile black)

fix hubconf

e2724f2

add huggingface entry (tested)

001c43b

apply black and isort

6379992

ftshijt added 2 commits December 9, 2023 11:08

update readme

932f066

add news to readme

897c689

leo19941227 self-requested a review December 10, 2023 15:18

leo19941227 reviewed Dec 10, 2023

View reviewed changes

s3prl/downstream/a2o-vc-vcc2020/utils.py Show resolved Hide resolved

leo19941227 reviewed Dec 10, 2023

View reviewed changes

s3prl/upstream/multires_hubert/expert.py Outdated Show resolved Hide resolved

leo19941227 reviewed Dec 10, 2023

View reviewed changes

remove unused legacy

f24da2b

leo19941227 merged commit fd69afe into s3prl:main Dec 11, 2023
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Multiresolution HuBERT as an additional upstream model #517

Add Multiresolution HuBERT as an additional upstream model #517

ftshijt commented Nov 28, 2023

ftshijt commented Dec 8, 2023

leo19941227 commented Dec 8, 2023

leo19941227 Dec 10, 2023 •

edited

ftshijt Dec 11, 2023

leo19941227 Dec 11, 2023

leo19941227 left a comment

ftshijt commented Dec 11, 2023

leo19941227 commented Dec 11, 2023

ftshijt commented Dec 11, 2023

leo19941227 commented Dec 11, 2023

Add Multiresolution HuBERT as an additional upstream model #517

Add Multiresolution HuBERT as an additional upstream model #517

Conversation

ftshijt commented Nov 28, 2023

ftshijt commented Dec 8, 2023

leo19941227 commented Dec 8, 2023

leo19941227 Dec 10, 2023 • edited

Choose a reason for hiding this comment

ftshijt Dec 11, 2023

Choose a reason for hiding this comment

leo19941227 Dec 11, 2023

Choose a reason for hiding this comment

leo19941227 left a comment

Choose a reason for hiding this comment

ftshijt commented Dec 11, 2023

leo19941227 commented Dec 11, 2023

ftshijt commented Dec 11, 2023

leo19941227 commented Dec 11, 2023

leo19941227 Dec 10, 2023 •

edited