Request: translate this project #48

oood · 2023-03-03T21:15:35Z

Rationale:

Since this project will benefit most non-English speakers, it would be more helpful if there was a translation in their language.

Suggestion:

Here's an example translation from Japanese Wikipedia:

DB データベース (Database)

That way, when a non-native speaker sees a abbr in code that they don't understand the meaning of, they can reverse lookup its original word, or help the non-native speaker make better abbreviations.

Before getting started, #46 needs to be considered to make contributing to the project easier.

This issue is a fork of a previous thread (#41), you may be interested in reading the previous discussion history.

kisvegabor · 2023-03-04T15:22:25Z

Do you mean having separate files for each language or adding the translations of original word next to that word? E.g.

software, 软件, ソフトウェア • 🟡 sw { computer science }

oood · 2023-03-04T16:59:05Z

Do you mean having separate files for each language

Of course, each language needs to have separate files ( README_ja-JP.md, README_it-IT.md, README_pt-BR.md ), otherwise that would be messy.

kisvegabor · 2023-03-04T18:45:39Z

I agree that it'd be more manageable this way.

I assume we will have max 1-2 new abbreviations or changes per week so it's only a little bit of extra work to maintain the translations.

However, we needed determined contributors for each languages else the translations will be out of sync easily.

Maybe we should store the abbreviations in a yaml/json file like:

{
	"software": {
		"translations": {
			"ZH": "软件",
			"JA": "ソフトウェア"
		},
		"recommended": "sw",
		"not recommended": ["softw", "sware"],
		"context based": {
			"some context": "blah",
			"some context2": "blah2"
		}
	}
}

and from this file we can easily generate the READMEs with CI.

T1xx1 · 2023-03-04T18:54:47Z

I agree with @kisvegabor using json generating the READMEs would be easier to maintain and in the same time we will fix the JSON problem to create the website.

T1xx1 · 2023-03-04T18:59:09Z

We should also have lime an index file only having the langauges we provide.

[
"sp",
"it"
and so on
]

So we iterate over this array and only take the en abbr and the corresponding translation in the object.

kisvegabor · 2023-03-04T18:59:09Z

Great! @oood suggested the same thing in #47 so it seems we are on the same page.

Let's find a format then! You can see my idea above, but feel free to suggest a something completely different.

oood · 2023-03-04T23:23:14Z

However, we needed determined contributors for each languages else the translations will be out of sync easily.

When we start a new language, it is assumed that the first contributor to start a new language has translated 100% of the current abbreviations/words, so it doesn't matter if the future is out of sync, words added in the future that are not translated will make it easier for people reading pages in that language to realize there is a need to contribute, like those red links on Wikipedia that don't have pages. and untranslated words could be linked to a contribution guide or something to encourage people to submit translations.

DB データベース (Database)
software, [translate it, link] • 🟡 sw { computer science }

Let's find a format then! You can see my idea above, but feel free to suggest a something completely different.

I think it would be great to do that.

We need to first determine what content a word may require.

oood · 2023-03-04T23:35:20Z

We should also have lime an index file only having the langauges we provide.
[
"sp",
"it"
and so on
]
So we iterate over this array and only take the en abbr and the corresponding translation in the object.

Yes, initially we only maintain EN.

We now need to determine what information an abbreviation contains:

Word, abbreviation, context, recommendation, link to issue and ?

BTW, our sorting should be based on abbreviations rather than words themselves, because an abbreviation may correspond to many words, and putting them together is helpful for retrieval.

T1xx1 · 2023-03-05T11:44:11Z

BTW, our sorting should be based on abbreviations rather than words themselves, because an abbreviation may correspond to many words, and putting them together is helpful for retrieval.

Nop. We just swapped the list some prs ago. When someone is searching for an abbrs they only know the word so it makes more sense to leave it like that

I assure you that sorting them by abbr was a mess.

kisvegabor · 2023-03-05T11:45:00Z

Word, abbreviation, context, recommendation, link to issue and ?

We need not recommended as well.

Nop. We just swapped the list some prs ago. When someone is searching for an abbrs they only know the word so it makes more sense to leave it like that

I just wanted to comment the same 🙂

T1xx1 · 2023-03-05T12:36:25Z

En README.
• _word_ • _recommendation degree_ [_abbr_](link to discussion) { _context if context-sensitive_ }
Sorted by word.

Any other lang README.
• (translation) _word_ • _recommendation degree_ [_abbr_](_link to discussion_) { _context if context-sensitive_ }
Sorted by translation.

T1xx1 · 2023-03-05T12:39:33Z

Realized that we can also use

[
"sp",
"it"
and so on
]

to create an index README with the available langs.
So if someone is searching for a specific lang it can go there without browsing in the code.
We should add to the en README a link like
[Others traslantions](...)

oood · 2023-03-05T14:34:36Z

As this project grows, I'm not sure if a readme file can be as long as a dictionary.

Maybe alphabetically?

abbreviations-in-code
├─── readme.md
├─── en-US
|	├─── A.md
|	├─── B.md
|	├─── ...
|	└─── Z.md
├─── it-IT
|	├─── A.md
|	├─── ...
|	├─── Z.md
|	└─── readme.md
├─── raw
|	└─── raw
└─── docs

oood · 2023-03-05T15:16:13Z

I assure you that sorting them by abbr was a mess.

I just wanted to comment the same 🙂

OK, I get it. because words are unique, abbreviations are not. I now agree with that idea.

We need not recommended as well.

of course.

En README.
• word • recommendation degree [abbr](link to discussion) { context if context-sensitive }
Sorted by word.

Another improvement I think we can make is context, because some words don't make sense even in their entirety, it needs a description or a link to a Wikipedia page for people to understand what it means.

to create an index README with the available langs.
So if someone is searching for a specific lang it can go there without browsing in the code.
We should add to the en README a link like

that would be great.

How about this?
If a language's page doesn't exist, we copy an English page under that language to encourage people to contribute.

oood · 2023-03-05T15:21:36Z

Maybe we should store the abbreviations in a yaml/json file like:

Is it possible to create separate json for each language?

Because, I think it will be complicated to maintain a huge json as the project grows, after all that requires manual merging.

Or just simply a plain text file, separated by commas, one word per line.

Because raw data has to be maintained by humans, we don't have to make it machine readable, machines can adapt with scripts.

kisvegabor · 2023-03-06T11:25:07Z

As this project grows, I'm not sure if a readme file can be as long as a dictionary.

I agree.

├─── en-US
| ├─── A.md
| ├─── B.md
| ├─── ...
| └─── Z.md

It depends on whether we add the acronyms or not. Without them we could have only 1 file/language which would be easier to read and search.

Is it possible to create separate json for each language?

I don't think it's a good idea because this way we need to maintain and keep in sync the whole _recommendation degree_ [_abbr_](_link to discussion_) { _context if context-sensitive_ } part for each language.

I suggest having a single DB file with all words, abbreviations, links,translations etc. It can be large, but it has a simple structure which is easy to follow and understand.

Because raw data has to be maintained by humans, we don't have to make it machine readable, machines can adapt with scripts.

I agree to to pick something which is good for people and write a script for this special format. E.g.

# software
- issue: 123  
- translations
   - zh: ...   
   - ja: ... 
- abbreviations
   - recommended: sw
   - context sensitive: foo1 (context1), foo2 (context2)
   - not recommended: softw, sware

It wasn't my intention but it looks like Markdown, which seems like a good compromise 🙂

T1xx1 · 2023-03-06T11:35:50Z

# software
- issue: 123  
- translations
   - zh: ...   
   - ja: ... 
- abbreviations
   - recommended: sw
   - context sensitive: foo1 (context1), foo2 (context2)
   - not recommended: softw, sware
It wasn't my intention but it looks like Markdown, which seems like a good compromise 🙂

I prefer having only one dB it's easier to maintain.

# **word**
- translations only if has translations
	- it: ...
	- sp: ...
- abbr: **abbr**
	- degree: red/purple/yellow/green
	- context: **context** only if needed

We need to make it super easy and short as possible so when will have 1000 of abbr the db would not be 10gb.

oood · 2023-03-06T13:47:41Z

It wasn't my intention but it looks like Markdown, which seems like a good compromise 🙂
I prefer having only one dB it's easier to maintain.

I don't have a lot of experience with database formats, so I can't give good advice.

Remember how I found this project by searching for the db? I'm working on a function that dumps a database format into human readable, so I found this project.

However I do want it to be human readable, we don't need to make it machine readable unless we make a tool/robot in the future that can automatically import from issues. But this is just my personal opinion, it depends on how you define future needs and find the best solution.

We need to make it super easy and short as possible so when will have 1000 of abbr the db would not be 10gb.

I don't think it will ever be 10GB since these are just plain text, but documents over 100MB are often difficult to load by a text editor. But I also don't think it will be more than 100MB, maybe 30MB at most including all languages and 5000+ words.

oood · 2023-03-06T18:07:17Z

# software
- issue: 123  
- translations
   - zh: ...   
   - ja: ... 
- abbreviations
   - recommended: sw
   - context sensitive: foo1 (context1), foo2 (context2)
   - not recommended: softw, sware

# **word**
- translations only if has translations
	- it: ...
	- sp: ...
- abbr: **abbr**
	- degree: red/purple/yellow/green
	- context: **context** only if needed

Honestly, the format you guys are thinking about looks a lot like yaml. I like this format, except it's whitespace sensitive.

Yaml also supports comments, which can be helpful, especially if you include comments in your database.

We may not have to reinvent the wheel, yaml is fine with me.

T1xx1 · 2023-03-06T18:17:53Z

I don't see it as yaml but text. JSON needs {}, [] and "" to be valid and after a bit your eyes go on vacation not considering the space and the format the database will have (no thanks). I don't like yaml and I don't have a good explanation. So that can be our abbr format. Maybe the db name can be.
Main.abbr.txt

oood · 2023-03-06T18:30:23Z

I don't see it as yaml but text.

JSON needs {}, [] and "" to be valid and after a bit your eyes go on vacation not considering the space and the format the database will have (no thanks).

Yes, and JSON needs a proper reader to be easy to read, so I really don't like it, I often use nano to edit text in the terminal, and to put it bluntly, I hate JSON. given that this is a github repository, we'll probably be editing directly with gitHub's web editor, and typing in a browser would suck.

I don't like yaml and I don't have a good explanation. So that can be our abbr format.

The biggest pet peeve of yaml for me is indentation, especially space indentation, which is very error prone.

Maybe the db name can be. Main.abbr.txt

Yes, this can be our own format, no need to follow yaml or json.

I like the freedom and manageability of the format. Anyway we can write a script to convert it to any format, so no problem.

T1xx1 · 2023-03-06T18:37:37Z

Talking about script and conversions in which lang we should write scripts.
We need a lang that run run locally to generate files. Previously the script was written in Perl but I suggest Nodejs, or bash. Preferably an interpreted so we don't have to compile the lang.

oood · 2023-03-06T18:41:28Z

# software
# **word**

Please consider not using # as headings, because we can create comments in the format, so that our script can skip commented lines.

Or use /* */ as the start of the comment.

It seems to me that this is the yaml 👇, maybe the spaces are out of specification, I didn't double check.

# this is a comment.
word:
- translations:
	- it: ...
- abbr:
	- degree: red/purple/yellow/green
	- context: **context** only if needed
- abbr2:
	- ...:
# new one
word2:
- ......

oood · 2023-03-06T18:44:11Z

We need a lang that run run locally to generate files. Previously the script was written in Perl but I suggest Nodejs, or bash. Preferably an interpreted so we don't have to compile the lang.

I like bash. It can be run directly in GitHub CI.

T1xx1 · 2023-03-06T18:57:57Z

# this is a comment.
word:
- translations:
	- it: ...
- abbr:
	- degree: red/purple/yellow/green
	- context: **context** only if needed
- abbr2:
	- ...:
# new one
word2:
- ......

Like this format.

I like bash. It can be run directly in GitHub CI.

Great.

oood · 2023-03-06T19:30:14Z

Like this format.

That's yaml, lol.

Great.

Yep, but in bash you can't match a string directly, you have to rely on external programs like awk.

T1xx1 · 2023-03-06T20:49:22Z

ChatGPT says otherwise.

In Bash, you can use pattern matching to check if a string matches a particular format. Here are some examples:

To check if a string matches a specific word:

bash
if [[ "$string" == "hello" ]]; then
  echo "The string is 'hello'"
fi

To check if a string starts with a particular prefix:

bash
if [[ "$string" == "prefix"* ]]; then
  echo "The string starts with 'prefix'"
fi

To check if a string ends with a particular suffix:

bash
Copy code
if [[ "$string" == *"suffix" ]]; then
  echo "The string ends with 'suffix'"
fi

To check if a string contains a particular substring:

bash
Copy code
if [[ "$string" == *"substring"* ]]; then
  echo "The string contains 'substring'"
fi

To check if a string matches a regular expression pattern:

bash
if [[ "$string" =~ ^[0-9]+$ ]]; then
  echo "The string consists of only digits"
fi

Note that the [[ and ]] are special operators in Bash that allow for more advanced pattern matching than the single [ and ] brackets used in the older test command.

oood · 2023-03-06T21:20:10Z

ChatGPT says otherwise.

No, I mean, you can't get matching words directly from the database, you have to rely on awk or something.

In bash you do something like this:

if [ word = $word ]; then
    Use "awk" to get the "$abbr" in it.
fi

So I'm not sure it will be as efficient as a perl script, since bash is calling a second program whose builtins are not suitable for getting abbr directly from word.

oood mentioned this issue Mar 3, 2023

Some advice about the licenses, contributions and more friendliness #41

Closed

kisvegabor mentioned this issue Mar 4, 2023

Request: Make the project more inclusive and accessible #47

Open

oood mentioned this issue Mar 4, 2023

Request: switch license #45

Open

oood mentioned this issue Mar 5, 2023

Request: make this project a dictionary #46

Closed

T1xx1 mentioned this issue Mar 5, 2023

📖 Dictionary project (plan) #51

Closed

23 tasks

T1xx1 added this to the Dictionary project milestone Mar 7, 2023

T1xx1 self-assigned this Nov 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request: translate this project #48

Request: translate this project #48

oood commented Mar 3, 2023

kisvegabor commented Mar 4, 2023

oood commented Mar 4, 2023

kisvegabor commented Mar 4, 2023

T1xx1 commented Mar 4, 2023

T1xx1 commented Mar 4, 2023

kisvegabor commented Mar 4, 2023 •

edited

oood commented Mar 4, 2023 •

edited

oood commented Mar 4, 2023 •

edited

T1xx1 commented Mar 5, 2023

kisvegabor commented Mar 5, 2023

T1xx1 commented Mar 5, 2023

T1xx1 commented Mar 5, 2023

oood commented Mar 5, 2023 •

edited

oood commented Mar 5, 2023

oood commented Mar 5, 2023

kisvegabor commented Mar 6, 2023 •

edited

T1xx1 commented Mar 6, 2023

oood commented Mar 6, 2023

oood commented Mar 6, 2023 •

edited

T1xx1 commented Mar 6, 2023

oood commented Mar 6, 2023

T1xx1 commented Mar 6, 2023

oood commented Mar 6, 2023 •

edited

oood commented Mar 6, 2023 •

edited

T1xx1 commented Mar 6, 2023

oood commented Mar 6, 2023

T1xx1 commented Mar 6, 2023 •

edited

oood commented Mar 6, 2023 •

edited

Request: translate this project #48

Request: translate this project #48

Comments

oood commented Mar 3, 2023

kisvegabor commented Mar 4, 2023

oood commented Mar 4, 2023

kisvegabor commented Mar 4, 2023

T1xx1 commented Mar 4, 2023

T1xx1 commented Mar 4, 2023

kisvegabor commented Mar 4, 2023 • edited

oood commented Mar 4, 2023 • edited

oood commented Mar 4, 2023 • edited

T1xx1 commented Mar 5, 2023

kisvegabor commented Mar 5, 2023

T1xx1 commented Mar 5, 2023

T1xx1 commented Mar 5, 2023

oood commented Mar 5, 2023 • edited

oood commented Mar 5, 2023

oood commented Mar 5, 2023

kisvegabor commented Mar 6, 2023 • edited

T1xx1 commented Mar 6, 2023

oood commented Mar 6, 2023

oood commented Mar 6, 2023 • edited

T1xx1 commented Mar 6, 2023

oood commented Mar 6, 2023

T1xx1 commented Mar 6, 2023

oood commented Mar 6, 2023 • edited

oood commented Mar 6, 2023 • edited

T1xx1 commented Mar 6, 2023

oood commented Mar 6, 2023

T1xx1 commented Mar 6, 2023 • edited

oood commented Mar 6, 2023 • edited

kisvegabor commented Mar 4, 2023 •

edited

oood commented Mar 4, 2023 •

edited

oood commented Mar 4, 2023 •

edited

oood commented Mar 5, 2023 •

edited

kisvegabor commented Mar 6, 2023 •

edited

oood commented Mar 6, 2023 •

edited

oood commented Mar 6, 2023 •

edited

oood commented Mar 6, 2023 •

edited

T1xx1 commented Mar 6, 2023 •

edited

oood commented Mar 6, 2023 •

edited