Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pre-extracted data in .tsv format #140

Open
Digital-XxX opened this issue Jul 11, 2022 · 6 comments
Open

pre-extracted data in .tsv format #140

Digital-XxX opened this issue Jul 11, 2022 · 6 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@Digital-XxX
Copy link

Please give me pre-extracted data in .tsv format please. Goldendict mobile cannot read .json dictionaries.

@kristian-clausal kristian-clausal added the enhancement New feature or request label Jul 11, 2022
@kristian-clausal
Copy link
Collaborator

Unfortunately the data we provide is not suitable to be used straightforwardly in .tsv or .csv. The JSON data is hierarchical, with big and reasonably sprawling word structures that contain smaller structures, dictionaries and lists, and translating that to .tsv needs to be done on a case-by-case basis. It's not a universal data format that is swappable between different programs (at least yet, or in the near future), it's just a bunch of data we've put into an adhoc data structure as need be.

To make what you want possible you need to:

We welcome any contributions to the project to make it more accessible.

@kristian-clausal kristian-clausal added the good first issue Good for newcomers label Jul 11, 2022
@Digital-XxX
Copy link
Author

Digital-XxX commented Jul 11, 2022

  • program a script that will do that translation by reading the json file object-by-object and then outputting it into .tsv

I think pyglossary supports conversion of .json to .tsv/.tab

@kristian-clausal
Copy link
Collaborator

  • program a script that will do that translation by reading the json file object-by-object and then outputting it into .tsv

I think pyglossary supports conversion of .json to .tsv/.tab

We would be happy to have someone implement a conversion utility for our .json to other formats, but someone has to code it first, and our data structure and format can change as time goes by.

@Vuizur
Copy link
Contributor

Vuizur commented Jul 27, 2022

I created a project that is able to create tsv/stardict/kindle dictionaries from the kaikki dump. It is only not extremely well tested, but possibly it works: https://github.com/Vuizur/ebook_dictionary_creator

@Vuizur
Copy link
Contributor

Vuizur commented Aug 22, 2022

I also now have a repository with directly downloadable dictionaries for a lot of languages in 3 different formats: https://github.com/Vuizur/Wiktionary-Dictionaries

@GrimPixel
Copy link

GrimPixel commented Apr 8, 2024

Here is a new tool: https://codeberg.org/GrimPixel/Text_to_Wordlist
You can place your text file in the corresponding directory 0_text, then check the text_setting.yaml and dictionary_setting.yaml, then run extract_text.py and extract_dictionary.py to generate a TSV file with values separated as described in README.adoc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants