Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

y_true (label) in CTC #11

Open
tuanphan09 opened this issue Mar 14, 2019 · 2 comments
Open

y_true (label) in CTC #11

tuanphan09 opened this issue Mar 14, 2019 · 2 comments

Comments

@tuanphan09
Copy link

tuanphan09 commented Mar 14, 2019

Hi,
I've just learnt CTC loss, and as I know it allows various length of labels as long as it's not longer than label_len. For that reason, I don't understand why you needed to pad '-' for the labels (your comment doesn't make sense btw):

# due to the explanation of ctc_loss, try to not add "-" for blank
while len(lexicon) < label_len:
     lexicon += "-"

and why you added '-' symbol in your vocabulary (characters):

characters = '0123456789'+string.ascii_lowercase+'-'
label_classes = len(characters)+1

EDIT:
I fought that you need to pad to the label to make the code run well. Last question, eg. label='12345---' and label_len=5, CTC just uses label[:label_len] for caculating the loss, right?

@sbillburg
Copy link
Owner

I have tried not to add "-" for blank, but it didn't work well.

I think your method may make sense

@tuanphan09
Copy link
Author

tuanphan09 commented Mar 19, 2019

After a few day of reading, I fought that we can add any symbols to the label, that doesn't matter. We do it because we want the input to have same size. When training we give CTC: padded-label and true label_len, the algorithm actualy doens't use the padding part (like the EDIT part in the first question above).

For that reason, you can make your model more efficiently by fixing this:

label_classes = len(characters)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants