New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request: translate this project #48
Comments
Do you mean having separate files for each language or adding the translations of original word next to that word? E.g.
|
Of course, each language needs to have separate files ( |
I agree that it'd be more manageable this way. I assume we will have max 1-2 new abbreviations or changes per week so it's only a little bit of extra work to maintain the translations. However, we needed determined contributors for each languages else the translations will be out of sync easily. Maybe we should store the abbreviations in a yaml/json file like: {
"software": {
"translations": {
"ZH": "软件",
"JA": "ソフトウェア"
},
"recommended": "sw",
"not recommended": ["softw", "sware"],
"context based": {
"some context": "blah",
"some context2": "blah2"
}
}
} and from this file we can easily generate the READMEs with CI. |
I agree with @kisvegabor using json generating the READMEs would be easier to maintain and in the same time we will fix the JSON problem to create the website. |
We should also have lime an index file only having the langauges we provide.
So we iterate over this array and only take the en abbr and the corresponding translation in the object. |
When we start a new language, it is assumed that the first contributor to start a new language has translated 100% of the current abbreviations/words, so it doesn't matter if the future is out of sync, words added in the future that are not translated will make it easier for people reading pages in that language to realize there is a need to contribute, like those red links on Wikipedia that don't have pages. and untranslated words could be linked to a contribution guide or something to encourage people to submit translations.
I think it would be great to do that. We need to first determine what content a word may require. |
Yes, initially we only maintain EN. We now need to determine what information an abbreviation contains:
BTW, our sorting should be based on abbreviations rather than words themselves, because an abbreviation may correspond to many words, and putting them together is helpful for retrieval. |
Nop. We just swapped the list some prs ago. When someone is searching for an abbrs they only know the word so it makes more sense to leave it like that I assure you that sorting them by abbr was a mess. |
We need
I just wanted to comment the same 🙂 |
En README. Any other lang README. |
Realized that we can also use
to create an index README with the available langs. |
As this project grows, I'm not sure if a readme file can be as long as a dictionary. Maybe alphabetically?
|
OK, I get it. because words are unique, abbreviations are not. I now agree with that idea.
of course.
Another improvement I think we can make is context, because some words don't make sense even in their entirety, it needs a description or a link to a Wikipedia page for people to understand what it means.
that would be great. How about this? |
Is it possible to create separate json for each language? Because, I think it will be complicated to maintain a huge json as the project grows, after all that requires manual merging. Or just simply a plain text file, separated by commas, one word per line. Because raw data has to be maintained by humans, we don't have to make it machine readable, machines can adapt with scripts. |
I agree.
It depends on whether we add the acronyms or not. Without them we could have only 1 file/language which would be easier to read and search.
I don't think it's a good idea because this way we need to maintain and keep in sync the whole I suggest having a single DB file with all words, abbreviations, links,translations etc. It can be large, but it has a simple structure which is easy to follow and understand.
I agree to to pick something which is good for people and write a script for this special format. E.g.
It wasn't my intention but it looks like Markdown, which seems like a good compromise 🙂 |
I prefer having only one dB it's easier to maintain.
We need to make it super easy and short as possible so when will have 1000 of abbr the db would not be 10gb. |
I don't have a lot of experience with database formats, so I can't give good advice. Remember how I found this project by searching for the However I do want it to be human readable, we don't need to make it machine readable unless we make a tool/robot in the future that can automatically import from issues. But this is just my personal opinion, it depends on how you define future needs and find the best solution.
I don't think it will ever be 10GB since these are just plain text, but documents over 100MB are often difficult to load by a text editor. But I also don't think it will be more than 100MB, maybe 30MB at most including all languages and 5000+ words. |
Honestly, the format you guys are thinking about looks a lot like yaml. I like this format, except it's whitespace sensitive. Yaml also supports comments, which can be helpful, especially if you include comments in your database. We may not have to reinvent the wheel, yaml is fine with me. |
I don't see it as yaml but text. JSON needs {}, [] and "" to be valid and after a bit your eyes go on vacation not considering the space and the format the database will have (no thanks). I don't like yaml and I don't have a good explanation. So that can be our abbr format. Maybe the db name can be. |
Yes, and JSON needs a proper reader to be easy to read, so I really don't like it, I often use nano to edit text in the terminal, and to put it bluntly, I hate JSON. given that this is a github repository, we'll probably be editing directly with gitHub's web editor, and typing in a browser would suck.
The biggest pet peeve of yaml for me is indentation, especially space indentation, which is very error prone.
Yes, this can be our own format, no need to follow yaml or json. I like the freedom and manageability of the format. Anyway we can write a script to convert it to any format, so no problem. |
Talking about script and conversions in which lang we should write scripts. |
Please consider not using Or use It seems to me that this is the yaml 👇, maybe the spaces are out of specification, I didn't double check.
|
I like bash. It can be run directly in GitHub CI. |
Like this format.
Great. |
That's yaml, lol.
Yep, but in bash you can't match a string directly, you have to rely on external programs like |
ChatGPT says otherwise.
|
No, I mean, you can't get matching words directly from the database, you have to rely on In bash you do something like this:
So I'm not sure it will be as efficient as a |
Rationale:
Since this project will benefit most non-English speakers, it would be more helpful if there was a translation in their language.
Suggestion:
Here's an example translation from Japanese Wikipedia:
That way, when a non-native speaker sees a abbr in code that they don't understand the meaning of, they can reverse lookup its original word, or help the non-native speaker make better abbreviations.
Before getting started, #46 needs to be considered to make contributing to the project easier.
This issue is a fork of a previous thread (#41), you may be interested in reading the previous discussion history.
The text was updated successfully, but these errors were encountered: