Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For Japanese, bug where lemma is blank for word, word reading is off. #231

Open
etherealite opened this issue Apr 30, 2024 · 7 comments
Open

Comments

@etherealite
Copy link

Woops, found another one.

image

Looks like this may have been parsed correctly but the lemma reading and word readings got crossed together or something. Lemma is empty.
image

@simjanos-dev
Copy link
Owner

That's an interesting one. Can you please copy paste this word here?

@etherealite
Copy link
Author

etherealite commented Apr 30, 2024

Sure thing, I have it right here from the raw file.

過ごせる

In context.

9
00:00:44,060 --> 00:00:52,000
もうそういう人は、僕が頑張って働くからこそ、日本ではゴールデンウィークを過ごせる人がいっぱいいるんだ。

@simjanos-dev
Copy link
Owner

Just a note. There's a known japanese issue with readings: #120.

@simjanos-dev
Copy link
Owner

That's a weird one. I deleted that single word from my database(don't do this on your production db), imported it again, and it is correct. I'll investigate this more in the future with a fresh database and I'll use the subtitle file to test it. Please comment here if you find multiple of this. I used Japanese, but haven't seen this problem before, or just haven't noticed because it's rare.

I also realized that I know this word, I just haven't been reading for a long time. :(

@etherealite
Copy link
Author

Hey, I'm super impressed that you can keep up more than one language at a time. I hope I don't forget as well lol.

You remember this thought right?
助けてくれてありがとう!

@simjanos-dev
Copy link
Owner

Hey, I'm super impressed that you can keep up more than one language at a time.

I'm not sure what you mean, I only learn Japanese.

You remember this thought right?
助けてくれてありがとう!

Yes, I do!

@simjanos-dev
Copy link
Owner

Sorry, but I cannot replicate this. This is what I see when I use an empty database, create an .srt file from your example, and import it as a subtitle:

ghissue

It is possible that it was imported from an other source first, and the inaccurate reading was generated there.

Are there maybe other words where you have kanji in your reading field? Or did you maybe use vocabulary import?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants