Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Book Language Metadata does not change to English if Only generating Word Wise #142

Open
3 tasks done
gloverd opened this issue Aug 17, 2023 · 10 comments
Open
3 tasks done
Labels
help wanted Extra attention is needed

Comments

@gloverd
Copy link

gloverd commented Aug 17, 2023

Checkboxes

  • I have read the document at xxyzz.github.io/WordDumb.
  • I have not found similar issue or disscussion at GitHub.
  • Reboot doesn't fix the problem.

Describe the bug

The documentation says that the metadata language of the book will be changed to English for non-english books automatically -- I don't think that is working when clicking "Create Word Wise" sub-menu item:
image

It goes through the steps of "Generating Word Wise" but when you open the book on a kindle, it does not show that word wise is available. Screenshots of : Book Metadata, Job Details, Kindle Screen showing no word wise
image
image
image

If the metadata language is manually set to English, then it does generate (as expected, which is indicating that it is not switching. Screenshots of : Book Metadata, Job Details, Kindle Screen showing word wise now working
image
image
image

I will be opening a separate Issue for the results of the Word Wise which you can see from the last screenshot is only looking up English words, I don't think its related to issue 141, but I have not been able to fix it with the 3.29.6 release from the artifacts.

System Information

OS: win10
Calibre: 6.24.0
python: 3.11
plugin ver: 3.29.6 (Installed from Artifacts)

Error message

No *Error* message appears.

Reproduce steps

  1. Set Book Metadata language to a non-English language.
  2. Click Create Word Wise in the Word Dumb dropdown menu.
  3. Open Book on Kindle.

Screenshots or videos

No response

@xxyzz
Copy link
Owner

xxyzz commented Aug 17, 2023

The document is kind confusion and it needs update. The code makes a copy of the book and sets the language of the copied book to English and sends this copied book to Kindle, because Word Wise is only enabled for English books. If you set the book language to English then the plugin will assume the book is in English and only looks for English words.

Could you upload the Word Wise database file created when the book language is French?

@xxyzz
Copy link
Owner

xxyzz commented Aug 17, 2023

Since this issue is not related to the solved issue 141, I'll answer your questions here:

  • Gloss length ratio is only used for EPUB books, and the pop up notes are still there
  • Gloss shown in calibre viewer is because it's a EPUB book
  • Only the default Chinese Word Wise db file is replaced, the English file is not touched. If you need the default Chinese file and connect to Wifi, Kindle will redownload it. Or you can keep a copy somewhere.
  • Maybe choose a smaller model if your machine is struggle with the load? You can delete the wordwise-lemmas folder in the calibre plugin folder, all downloaded word wise data files are saved there.

@gloverd
Copy link
Author

gloverd commented Aug 20, 2023

For some reason I can no longer run "Generate Word Wise" on .mobi books. I've tried clean installs of plug-in and removing the associated folders under %APPDATA% , it consistently just keeps running where as in the past it would at least complete. sometimes in seconds, but most often a few minutes (as per screenshots in #141). It will run on epub files. I wonder if I corrupted the book somehow as part of this... This is one of the previously generated files I had in my kindle.

In order to upload, I renamed the .kll to .txt
LanguageLayer.en.BBB2IHO521.txt

In this one, for example, I see the french word "Morale" picked up with gloss as "Moral", other pairs are (Talons, griffe), (Savants, savant), (Instant, immédiat, instantané), (unique, unique), ...

@xxyzz
Copy link
Owner

xxyzz commented Aug 20, 2023

I think it runs so slow with French books maybe is because the default setting have too much enabled lemmas.

And I fixed a bug for KFX books: ba6582e, but you're using MOBI book?

@gloverd
Copy link
Author

gloverd commented Aug 20, 2023

I've tried KFX, mobi, and epub in the past. I have this running in the background right now; I downloaded a new out-of-copyright book (Les Miserables) as an epub file. I converted it to MOBI, and am running only the "Generate Word Wise" (not the full word dumb button). It has been running for 30 minutes at this point.
image

You may be onto something about the size, because I can run it for english books quite fast. As far as trying with fewer lemmas, If I uncheck the enabled button in the customize kindle wordwise pop-up for a whole series of words, will that improve performance, or does the fact that it still has to look up the word before determining if it is enabled or not prevent significant improvements?
image

@gloverd
Copy link
Author

gloverd commented Aug 22, 2023

I disabled the lemmas under difficulty 5 and 4, and it finally produced the expected result. Some of the lemmas in 5 are probably WAY too common in text (it has words like "it", "not/no", "the (plural)", "a"), and level 4 also has some very common words; so I'm sure that it is bogging it down.

It took 5.5 hours to save the updated lemma file. I tried to export it and re-import it, but I don't think that's possible? the exported file doesn't seem to have any information about the level or enabled status; and I'm not sure if I can just rename it to enable its import.

After a computer restart though; it no longer works. I am going through the process of re-saving the lemmas and will re-try.

@xxyzz
Copy link
Owner

xxyzz commented Aug 22, 2023

When the "save" button is clocked the code creates a file for spaCy to use later, maybe the enabled words by default are too many so this process is very slow. You can use SQL to disable large rows in a query to db file worddumb-lemmas/fr/wiktionary_fr_fr_v0.db(with SQLite command or https://sqlitebrowser.org):

UPDATE senses SET enabled = 0 WHERE difficulty < 3;

Then click the save button it should runs faster. I should make enabled words much less by default but haven't find better data source to convert to the difficulty value.

The export feature is for creating Anki cards. Your settings for lemmas are saved to the db file.

@gloverd
Copy link
Author

gloverd commented Aug 22, 2023

That really has helped!
Saving new lemmas down to 43m from 330m, and per-book word-wise generation about 70% faster!

@xxyzz
Copy link
Owner

xxyzz commented Aug 26, 2023

I test a French book in KFX and AZW3 format and both have working Word Wise now. But for a better quality enabled French words by default, data similar to how English and Chinese default words are chosen are needed: https://github.com/xxyzz/Proficiency

@xxyzz xxyzz added the help wanted Extra attention is needed label Aug 27, 2023
@xxyzz
Copy link
Owner

xxyzz commented Feb 24, 2024

97394c9 should improve the save lemmas job speed, you could download the test version from here: https://github.com/xxyzz/WordDumb/actions/runs/8028950382

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants