Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added en-ja to MUST_C_V2 dataset (ST) recipes #5648

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

jasonmusespresso
Copy link

@jasonmusespresso jasonmusespresso commented Feb 1, 2024

What?

added and tested two new target languages

  • updated dataset names
  • create the vocab size for en-zh from 4k to 8k;

Why?

language expansion for must_c_v2

Attentional Enc-Dec (st_train_st_conformer_raw_en_ja_bpe_tc4000_sp)

cd espnet
git checkout c25f8762a6f9b7c7c5739fe3e1e72c077e566a60
pip install -e .
cd home/jbao/must_c_v2_st1
./run.sh --skip_data_prep false --skip_train true --download_model jasonmusespresso/must_c_v2_st_train_st_conformer_raw_en_ja_bpe_tc4000_sp_valid
  • Environments
    • python version: 3.9.18 (main, Sep 11 2023, 13:41:44) [GCC 11.2.0]
    • espnet version: espnet 202402
    • pytorch version: pytorch 2.1.0
dataset score verbose_score
decode_st_conformer_st_model_valid.acc.best/tst-COMMON.en-ja 11.5 42.1/17.2/8.1/4.0 (BP = 0.932 ratio = 0.935 hyp_len = 51925 ref_len = 55563)
decode_st_conformer_st_model_valid.acc.best/tst-HE.en-ja 12.2 42.0/17.6/8.3/4.2 (BP = 0.960 ratio = 0.961 hyp_len = 12241 ref_len = 12744)

See also

@jasonmusespresso
Copy link
Author

FYI @brianyan918

@ftshijt ftshijt added Recipe ST Speech translation labels Feb 2, 2024
@ftshijt ftshijt added this to the v.202312 milestone Feb 2, 2024
@weikeduo1
Copy link

must_c_v2/asr1 missing data.sh

@kan-bayashi kan-bayashi modified the milestones: v.202312, v.202405 Feb 6, 2024
@jasonmusespresso jasonmusespresso marked this pull request as ready for review February 8, 2024 13:58
@sw005320
Copy link
Contributor

sw005320 commented Feb 8, 2024

  • @brianyan918, can you review it?

  • @jasonmusespresso, please add the results and config files. Also, please upload the model and put the link to the results.

@sw005320
Copy link
Contributor

sw005320 commented Feb 8, 2024

@sw005320
Copy link
Contributor

sw005320 commented Feb 8, 2024

@brianyan918
Copy link
Contributor

LGTM thanks @jasonmusespresso

Copy link

codecov bot commented Feb 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 48.96%. Comparing base (f6f011d) to head (823898d).

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #5648       +/-   ##
===========================================
+ Coverage   14.24%   48.96%   +34.72%     
===========================================
  Files         757      494      -263     
  Lines       69304    43593    -25711     
===========================================
+ Hits         9872    21346    +11474     
+ Misses      59432    22247    -37185     
Flag Coverage Δ
test_integration_espnet2 48.96% <ø> (?)
test_python_espnetez ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not change it with our local setup

@mergify mergify bot added the README label Mar 29, 2024
@jasonmusespresso jasonmusespresso changed the title added en-zh, and en-ja to MUST_C_V2 dataset (ST) recipes added en-ja to MUST_C_V2 dataset (ST) recipes Mar 29, 2024
@sw005320
Copy link
Contributor

It looks good to me, but I want to have input from @brianyan918

@brianyan918,

  • can you confirm the change for sacrebleu_opt_extra?
  • compared with the other recipes, the README.md style is different, which is fine. However, I could not find the model link for all models. @jasonmusespresso seems to follow this style. Is it OK? I think it is better to provide the model link.

@brianyan918
Copy link
Contributor

It looks good to me, but I want to have input from @brianyan918

@brianyan918,

  • can you confirm the change for sacrebleu_opt_extra?
  • compared with the other recipes, the README.md style is different, which is fine. However, I could not find the model link for all models. @jasonmusespresso seems to follow this style. Is it OK? I think it is better to provide the model link.

The sacrebleu_opt_extra looks good, but shouldn't there be some value passed from run.sh? @jasonmusespresso

@jasonmusespresso
Copy link
Author

Details

@brianyan918 yes, added it. in my run, I pass it when calling ./run.sh (e.g. ./run.sh --stage 13 --stop-stage 17 --sacrebleu_opt_extra "-tok ja-mecab -l en-ja --smooth-method exp"...), which should also work.

@sw005320 sw005320 added the auto-merge Enable auto-merge label Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants