Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Python scripting: format GSheet csv to json via Google API data pull #665

Open
1 task done
ngiangre opened this issue Apr 29, 2020 · 7 comments · Fixed by #679
Open
1 task done

[FEAT] Python scripting: format GSheet csv to json via Google API data pull #665

ngiangre opened this issue Apr 29, 2020 · 7 comments · Fixed by #679
Assignees
Labels
Backend Issues pertaining to the Back end team

Comments

@ngiangre
Copy link
Contributor

⚠️ IMPORTANT: Please fill out this template to give us as much information as possible to consider/implement the feature.

Prerequisites

  • check this box if you have completed the following:
    • Reviewed the contributing guidelines and support files
    • Reviewed the README file for the repository you are working in
    • Searched for relevant instructions on our Discord server
    • Searched the issues of the repository you are working in to make sure one was not already filed

Summary

This is a modular issue sprouting from #643

All the translations are in an accurate format and now we need to pull the sheets (English, Spanish, French, Italian, Dutch (Netherlands), Russian) via the google API into the respository for front end developers to reference by the json keys.

There is a link in the python script src/python/pull_gsheet_data.py for creating your own google api key. This python script is just a start and needs more development.

each translation sheet has parentKey, childKey, fieldKey, value, translatedValue, and then other columns. We need a structure of { 'parentKey' : { 'childKey' : { 'fieldKey' : { 'value' : '', 'translatedValue' : '', ... } } } }. In the case of education, { 'parentKey' : { 'childKey' : [ {'value': '', ... }, {'value' : '', ... }, ... ] } }

The Date attributes are not needed - filter those out.

The resulting json files should go into public/locales/ though I put them in docs/content to not mess things up.

Here's some extra code I already started using that might be helpful:
`

Init sheet names and output dir

data_model_sheet_name = "Data Model"
education_sheet_name="Education"
health_sheet_name = "Health"
translation_sheets_regex = " - Master Sheet"
translation_sheets_not_regex = "OLD"
languages_to_pull = ['English','Dutch (Netherlands)','Spanish','Italian','French','Russian']
out_dir = "../../docs/content/"

Set dictionary to connect languages to two-letter abbreviations

#https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
language_letters_dict= { 'English' : 'en', 'Dutch' : 'nl', 'Spanish' : 'es',
'Italian' : 'it', 'French' : 'fr', 'Russian' : 'ru'}
`

Motivation

We need to have translations.json files in locales to support other languages

Possible Alternatives

only english and then hard coding.

Additional Context

Please comment here for more detail or to work through fixing the issue. You can ask @ngiangre for assistance on python scripting.

@pavel-ilin
Copy link

I'm happy to remind myself how to work with python!

@ngiangre
Copy link
Contributor Author

go ahead @pavel-ilin!

@pavel-ilin pavel-ilin self-assigned this Apr 29, 2020
@SomeMoosery SomeMoosery added this to To do in CoronaTracker Kanban Board via automation Apr 29, 2020
@SomeMoosery SomeMoosery added the Backend Issues pertaining to the Back end team label Apr 29, 2020
CoronaTracker Kanban Board automation moved this from To do to Done Apr 30, 2020
@ngiangre
Copy link
Contributor Author

This still needs work!!

@ngiangre ngiangre reopened this Apr 30, 2020
CoronaTracker Kanban Board automation moved this from Done to In progress Apr 30, 2020
@SomeMoosery
Copy link
Member

My bad - didn't think about how the "fixes" keyword would close this!

@ngiangre
Copy link
Contributor Author

no worries haha I’m barely keeping up. ISSUE IS BACK OPEN! We need to create nested jsons from google api pulled csv files!

@pavel-ilin pavel-ilin mentioned this issue Apr 30, 2020
8 tasks
@ngiangre
Copy link
Contributor Author

ngiangre commented May 1, 2020

Some progress on this. Here's a preview

Screen Shot 2020-05-01 at 12 19 43 AM

and an example where an array of values would be favorable:

Screen Shot 2020-05-01 at 12 30 34 AM

I posted a translation.json in the #engineering channel on discord if y'all want to see the full json.

Let me know if this looks good and would be workable! @AdhamAH @pavel-ilin @SomeMoosery

@ngiangre
Copy link
Contributor Author

ngiangre commented May 3, 2020

There has been tremendous work done on this by @kristianr on discord - thank you!!

We have one more step - converting key strings with >20 characters into shorter strings using common nlp filters, stemming, removing stop words, etc.

The goal is to make a representative and short key string for the education facts and quizzes. This would be a medium priority issue that would be an easy integration into the current algorithm that @nickg and @kristianr have on discord.

Can someone one one work on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backend Issues pertaining to the Back end team
Projects
3 participants