Skip to content
This repository has been archived by the owner on May 2, 2022. It is now read-only.

Sync missing translation keys in app/locales/*.json files to a Google Sheet #481

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

michaelmcmillan
Copy link
Member

@michaelmcmillan michaelmcmillan commented Mar 29, 2020

The script is heavily commented with what I've been thinking. There might be a logical error here, so please read through the script to see if it makes sense.

This script will produce false positives (since it looks at all changes to the file ever). Why did I make it like that? Because translations are added at different points in time – a new feature branch will contain new translations, while all the existing branches and locales will be missing those.

This will produce some false positives (translations which have been removed etc.), but it in turn include translations which are being worked on right now in different branches (which should allow us to start translating before the PR is done, and widen the bottleneck which is currently halting PRs). We could also just add the false positives to an ignore list to make them go away.

I'm hoping this should make it easier for us to quickly grab all missing translations for all locales and then handing them off to a non-technical person that don't know JSON or git, maybe via. a more friendly medium like Google Sheets.

This is what I get if I run the script right now for no, en-IN and se.

no missing translation for: Sweden
... basically means that the Norwegian locale (no.json) has never had a row in its JSON file where the key is Sweden and the value is Sverige.

Does this make sense? Haha, I'm actually not sure. Here's the full output:

no missing translation for: Sweden
no missing translation for: https://www.whatismyzip.com
no missing translation for: What is my Zip code?
---
se missing translation for: this
se missing translation for: https://who.org
se missing translation for:  article as an attempt to chart these numbers.
se missing translation for: In total <%= numberWithSpaces(totalPeopleInContactWithInfected) %> people have reported that they have been in close contact with a person who was tested positive for COVID-19.
se missing translation for: In total <%= numberWithSpaces(totalInfectedPeopleWithSymptoms) %> people have reported that they have tested positive for COVID-19 and experience symptoms.
se missing translation for: In total <%= numberWithSpaces(totalPeopleInContactWithInfected) %> people have reported that they have been tested for COVID-19.
se missing translation for: Join the most important crowdsource! Regardless if you're healthy or not, please submit the form below – that is also valuable information!
se missing translation for: Zip code information
se missing translation for: Netherlands
se missing translation for:  if you can't get it to work.
se missing translation for: I agree to my being data in accordance with the privacy statement
se missing translation for: To WHO.org
se missing translation for: /healthcondition/
se missing translation for: Using self-report, you can get a better picture of how many people have symptoms without exposing health workers to potential contamination hazards and without using up valuable infection control equipment that is already in short supply. We at Bustbyte, well helped by other volunteers, have created this tool in response to
se missing translation for: In total <%= numberWithSpaces(totalPeopleWithSymptoms) %> people have reported that they experience symptoms.
---
en-IN missing translation for: this
en-IN missing translation for: https://who.org
en-IN missing translation for:  article as an attempt to chart these numbers.
en-IN missing translation for: In total <%= numberWithSpaces(totalPeopleInContactWithInfected) %> people have reported that they have been in close contact with a person who was tested positive for COVID-19.
en-IN missing translation for: In total <%= numberWithSpaces(totalInfectedPeopleWithSymptoms) %> people have reported that they have tested positive for COVID-19 and experience symptoms.
en-IN missing translation for: In total <%= numberWithSpaces(totalPeopleInContactWithInfected) %> people have reported that they have been tested for COVID-19.
en-IN missing translation for: Join the most important crowdsource! Regardless if you're healthy or not, please submit the form below – that is also valuable information!
en-IN missing translation for: Zip code information
en-IN missing translation for: Netherlands
en-IN missing translation for:  if you can't get it to work.
en-IN missing translation for: I agree to my being data in accordance with the privacy statement
en-IN missing translation for: To WHO.org
en-IN missing translation for: /healthcondition/
en-IN missing translation for: Using self-report, you can get a better picture of how many people have symptoms without exposing health workers to potential contamination hazards and without using up valuable infection control equipment that is already in short supply. We at Bustbyte, well helped by other volunteers, have created this tool in response to
en-IN missing translation for: In total <%= numberWithSpaces(totalPeopleWithSymptoms) %> people have reported that they experience symptoms.

@michaelmcmillan michaelmcmillan changed the title Lost in translation script in JavaScript rather than Python Sync missing translation keys in app/locales/*.json files to a Google Sheet Mar 29, 2020
@fossecode
Copy link
Member

fossecode commented Mar 29, 2020

Nice! Couple of questions:

  1. When and who runs the script? On PRs?
  2. Does it make sense to run it on all branches and all commits, or should we only run it on the current state on the branch you are on?
  3. Any reasons javascript is used instead of typescript?

*
* So what we do here is simply to find *all* the english translation keys across the entire
* project. That means looking at all the english translations keys in all the locale files
* since the start of the project across all branches and PRs. We throw them into a set that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds unnecessary to me, as it would also include WIP's and stale branches. But it might be some cases where this makes sense that I have not thought about?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's the part that sucks, but having it would make sure that all translations are added and (hopefully) translated by the time we want to merge it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, I understand. So a possible solution would be to run this if this run as a cronjob or something, so that the sheet is updated as soon as there are new texts in the repo?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so in the case with Stano's PR, those translations are now added to the sheet (even though his PR is WIP). Not sure if we want that or not, but we could try it out?

if (translationKeysByLocale[locale] === undefined) {
translationKeysByLocale[locale] = new Set([]);
}
translationKeysByLocale[locale] = new Set([...translationKeysByLocale[locale], ...translationKeys]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel stuff like this is mentally much easier to visualize when translationKeysByLocale is typed :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we can tweak that, wanted to get it out quickly before bed

@michaelmcmillan
Copy link
Member Author

michaelmcmillan commented Mar 29, 2020

Nice! Couple of questions:

  1. When and who runs the script? On PRs?

Not sure what's best. PRs would work

  1. Does it make sense to run it on all branches and all commits, or should we only run it on the current state on the branch you are on?

See comment above.

  1. Any reasons javascript is used instead of typescript?

Wanted to crank it out fast. Types can be added.

for (let sheetIndex = 0; sheetIndex < doc.sheetCount; sheetIndex++) {
const sheet = doc.sheetsByIndex[sheetIndex];
for (const locale of allLocales) {
if (sheet.title !== locale) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand it correctly, there is one sheet per locale?

Is it then necessary to loop through the sheets here, can't we just loop through the locales and try to add them (from the code it seems like it throws an error if it does not exist)?

try {
          await doc.addSheet({ title: locale, headerValues: ['key', 'translation'] });
        } catch (error) {
          // We don't do anything if the sheet for this locale exists.
        }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, take a look at the sheet here: https://docs.google.com/spreadsheets/d/1ILFfc1DX4ujMnLnf9UqhwQGM9Ke3s1cAWciy8VqMHZw/edit#gid=462835362

The reason I'm looping over locales is because that way we will create sheets for new locales automatically. There is no way in the API to retrieve the sheet names in the spreadsheet. This way we don't have to worry about keeping locales and sheets in sync (it will sort it self out).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't read your comment thoroughly enough. Yeah, maybe we can get away without that loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha getting late... The reason we have to loop over those are because there is no way of retrieving a sheet by its name. Does that make sense? The doc.addSheet doesn't return the sheet if it doesn't exist, it only throws an exception. So we have to, at some point, look at all the sheets to check if they match the current locale we are looping over.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I thought the for-loop ended after adding a sheet. A little hard to wrap my head around the logic, but as long as it works it is more than good enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Just refactored it, better now?

@fossecode
Copy link
Member

Great job, I think it will make it much easier to handle translations :)

@michaelmcmillan
Copy link
Member Author

Just tested this on @sbocinec's PR #483 and it seems to have worked 👍

Screenshot 2020-03-30 at 00 09 17

@fossecode
Copy link
Member

Just tested this on @sbocinec's PR #483 and it seems to have worked 👍

Screenshot 2020-03-30 at 00 09 17

Haha nice, also found this bug from looking at the picture: b2988f4

@fossecode
Copy link
Member

fossecode commented Mar 29, 2020

Although a few of the keys are not part of the translations anymore. Maybe add a whitelist or something where we can declare those?

for (const locale of allLocales) {

// Create sheet for this locale if it doesn't already exist.
let matchingSheet = sheets.find(sheet => sheet.title === locale);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, much better 👍

@michaelmcmillan
Copy link
Member Author

Next step before it's ready to merge:

  1. Find a way of git fetching all PRs to local.
  2. Add sync in the other direction (from Sheet to locale.json).

@trkoch
Copy link

trkoch commented Apr 10, 2020

@michaelmcmillan hub (an extension to git) might be of use for fetching PRs.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants