Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train_size referenced before assignment #93

Open
bayethiernodiop opened this issue Mar 8, 2019 · 9 comments
Open

train_size referenced before assignment #93

bayethiernodiop opened this issue Mar 8, 2019 · 9 comments

Comments

@bayethiernodiop
Copy link

Hello, when trying to use the create-corpora -d corpora -f out/clips.tsv command i have the following error UnboundLocalError: local variable 'train_size' referenced before assignment caused by at line 115.
Help please.

@bayethiernodiop
Copy link
Author

after checking the code the bug is for a given langauge if there is no validated audio the train_size will not be intialized. the reason is that the len(validated is 0)

@kdavis-mozilla
Copy link
Contributor

This was created to deal with the release clips.tsv and not a clips.tsv generated after the release. The release clips.tsv did not have any languages without validated clips.

That said, crashing, as it does now, or issuing a warning and exiting seems like the only reasonable behaviors in this situation.

What do you think?

@bayethiernodiop
Copy link
Author

ah ok i see. i was thinking of ignoring validation set for language without validated recordings.

@bayethiernodiop
Copy link
Author

and print a warning because it may be a problem for those deploying the common voice locally.

@kdavis-mozilla
Copy link
Contributor

@Gregoor What would be you desired behavior for the scripts that create the Common Voice releases?

My gut says a catastrophic failure might be of more use in that situation as it will definitely be noticed while a "silent" error being printed might not be noticed and data sets would go out without data.

@Gregoor
Copy link
Contributor

Gregoor commented Mar 12, 2019

My take would be that we should notice at some other point that we don't have validated data for a language. The bundler script also gathers stats, in part based on the corpora creator's result, which would show if a validated set is empty for a language.

@kdavis-mozilla
Copy link
Contributor

@Gregoor I guess my question is this: If the stats are gathered, will they be reliably looked at?

There are 96 languages in pontoon so it's not so unlikely that if 96, or more, languages are released one stat will be overlooked.

However, if the bundler script doesn't run to completion, then no stats are there, and that will be noticed.

@Gregoor
Copy link
Contributor

Gregoor commented Mar 12, 2019

Yeah that's a fair point, probably not. But I think the bundler would be a better place to act on this. And I guess we have to decide how we'd wanna act on it (maybe exclude the language from release?).

@bayethiernodiop
Copy link
Author

i think exclude it from release is a good choice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants