Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate MB upload via script #147

Open
abettermap opened this issue Dec 8, 2020 · 29 comments
Open

Automate MB upload via script #147

abettermap opened this issue Dec 8, 2020 · 29 comments
Labels
effort: 2 🥈 (med)🥈 Average amount of work focus: 🏧 automation🏧 e.g. CI/CD, GitHub Actions, data conversion, deployments. priority: 0 (wishlist) Some day! type: ✨ enhancement✨ New feature, improvement, or functionality

Comments

@abettermap
Copy link
Contributor

...so Ross can upload MB data in one shot.

Originally posted by @abettermap in #144 (comment)

@abettermap
Copy link
Contributor Author

@rperlin-ela

I think I can work this in with the Airtable stuff pretty well. I'll be using NodeJS for all of it, so let's see if you have it first.

  1. Open Terminal on your Mac
  2. Type which node
  3. Press Enter

If it's a path

...like /ok/this/that/stuff, then enter node -v and let me know what it says.

If it's not

...e.g. something like node not found, then we'll need to install it. Easiest way is likely the official installer so download and run that, but when you get to this step:

image

...if you're able to Change Install Location..., set it to something in your home directory (you know, the one with "Downloads", "Pictures", etc.). Create a folder called "Node" or something, then stuff it there.

If not, it should still work, it just unfortunately wants admin rights for any global install-y stuff. Either way, finish the install steps then make sure it stuck:

  1. Close Terminal
  2. Open Terminal
  3. Enter which node again

Got a path now? If so, you should be set for the Node part. If not, we'll have to troubleshoot.

Installing Node is a one-off btw, won't have to repeat it unless you use a different computer.

@abettermap
Copy link
Contributor Author

This also transitions into a larger question: now that I can access data more easily via Airtable, are we okay with only uploading data to MB with the bare minimum of fields? I think it would just be the ones needed for labels, symb, ID, and location:

  • ID
  • Language
  • Endonym
  • Latitude
  • Longitude
  • Size
  • Status
  • World Region

If so, how should I handle the potential disparity between Airtable and MB, or should I just assume that they are one and the same? Note that only a change to the fields above (or removing/adding records, obviously) would require a fresh upload. In other words, if you make any changes to language-level stuff which doesn't involve any of those fields, there would be no need to upload since I'm hitting Airtable for everything, which should be more or less real-time like Sheets.

Does that make sense?

@abettermap
Copy link
Contributor Author

Actually the Airtable Plus plan (looks like you've upgraded, nice!) supports scripts, so I might be able to do the MB upload within Airtable. Stay tuned...

@abettermap
Copy link
Contributor Author

It's pretty robust and would probably work, but this statement:

The scripting app is part of Airtable Apps, a Pro plan feature. Apps let you extend the functionality of your bases: you can use apps to bring new information into Airtable, visualize and summarize your records in colorful ways, and even directly integrate your Airtable bases with your favorite apps. The scripting app has been made available on the Free and Plus plans until March 2021.

...makes me wonder what happens in March: does it just disappear on the Plus plan? Or is it like a "get it while it's hot (and free)" thing where if we install it now (already did) then we're grandfathered?

@abettermap
Copy link
Contributor Author

Installing the app itself says this:

Users on Free and Plus plans can use the scripting app until March 2021!

I'm reading that like "...and then we'll hope your workflow requires it and you'll upgrade to Pro!".

Is that what it looks like to you? If so, then go ahead with the Node install.

@rperlin-ela
Copy link
Collaborator

If it's a path

...like /ok/this/that/stuff, then enter node -v and let me know what it says.

Ok, I installed following instructions and now get a path (/usr/local/bin/node) when I type which node into terminal and the version is v14.15.1

This also transitions into a larger question: now that I can access data more easily via Airtable, are we okay with only uploading data to MB with the bare minimum of fields? I think it would just be the ones needed for labels, symb, ID, and location:

  • ID
  • Language
  • Endonym
  • Latitude
  • Longitude
  • Size
  • Status
  • World Region

What about (instance-level) Description and Macrocommunity?

I take it we're talking all AirTable now, no more Sheets, and that the script would be pulling from "Data" in AirTable, right? So once there are some changes in "Data", I run the script. Are you saying that otherwise, if I'm getting it, everything else can be changed in real-time with the Config sheet in AirTable. Would be good to test this out soon and maybe walk through the whole AirTable setup, which looks intense—nice work.

If so, how should I handle the potential disparity between Airtable and MB, or should I just assume that they are one and the same? Note that only a change to the fields above (or removing/adding records, obviously) would require a fresh upload. In other words, if you make any changes to language-level stuff which doesn't involve any of those fields, there would be no need to upload since I'm hitting Airtable for everything, which should be more or less real-time like Sheets.

Does that make sense?

Yes, as above, I get it in theory. But not sure what you mean "potential disparity" — you mean between AirTable Config and MB or AirTable Data and MB? In general, I would assume AirTable is always what's more up to date, right?

Is that what it looks like to you? If so, then go ahead with the Node install.

Your interpretation sounds right — I'd rather have our own evergreen(ish) way of getting it done.

@abettermap
Copy link
Contributor Author

What about (instance-level) Description and Macrocommunity?

Those aren't needed in the map though, right?

I take it we're talking all AirTable now, no more Sheets, and that the script would be pulling from "Data" in AirTable, right?

correct, should just be the handful of fields i mentioned above that will end up in MB. more of a separation of concerns that way, and the rest of the Airtable-driven stuff (used by omnibox and table) can load in the background instead of waiting for the map.

no more Sheets

correct, except for the two census tables (the 2014-2018 ones) i think? that could go in Airtable as a separate base as well. it wouldn't really be integrated with validation in the Data or Config (which I renamed to Language btw) tables in Airtable like it currently is with sheets, but i'm confident Matt can handle the once-a-year or maybe-never field name changes in the LUT_PUMA_Fields and LUT_Tract_Fields carefully if needed.

if we move the 2014-2018 over to their own Airtable bases, I think we will be Sheets-free at that point. still moving parts of course, just fewer parts and more movement. ⚙️

So once there are some changes in "Data", I run the script.

if those changes include new records, deleted records, or changed records with any of the MB fields i mentioned above, then yes. alternatively if the script doesn't pan out (new territory there), Airtable allows a CSV download just like Sheets. And actually if you're fine with that for now, i can just make you an MB-only view in Airtable like i started here. Honestly if it's just those fields then i think an upload script is way overkill. i proposed it when we were doing the everything-in-MB approach, so i'm not sure it's worth it now. Airtable provides a nice little CSV based on that view i created: ready-for-mb-csv.zip

i have a LOT on my plate to adapt the code to Airtable and polish up the Airtable base, so i would prefer to focus on that. sorry for any time you lost to Node, but it's probably not as much time as i lost to brainstorming the script, researching the easiest way for you to install it, and then documenting the steps. ;)

Are you saying that otherwise, if I'm getting it, everything else can be changed in real-time with the Config sheet in AirTable.

Totes McGoats. For better or worse with that "real-time" part, but that's always been the case using the same data for prod and deploys. It would be smart to have two sets of synced Airtable stuff, one for each of those two env's, but we've gotten this far with the current single-environment (aka same data used by both prod and deploys) and i'm pretty careful about avoiding breaking changes, and being quick about "pulling the PR merge rug real quick" when it does affect prod.

all this and really anything i'm doing w/Airtable is super-duper out of this scope of work, so i'm going to backpedal away from the dual-env or fancy-script path before i've gone too far.

Would be good to test this out soon and maybe walk through the whole AirTable setup

yeah for sure. let me clean up Airtable a bit, it's a hodgepodge of WIP and ready right now, and a total of 0% of the code has been adapted to it so far.

if you do any Airtable perusing (which I would encourage), the main things are:

  1. The top-level Explore tables which were formerly LUTs have been renamed to their respective column names (e.g. Country instead of LUT_Countries or whatever). This way I can hit them in the API with the same text coming from our own URL routes.
  2. There are a LOT of fields (30+ in Data and Language) and a lot of hidden fields. It's mostly because any time there's a "Link" field between two tables, it creates a new field, plus any of the related ones, plus potentially some formula/processing fields (e.g. to keep only unique values). It's a total mess for editing, but it's very easy to toggle and search for columns (and search for tables for that matter), and it doesn't affect my end in the code because i can control which fields i hit.
  3. Most of the LUTs are just that, single-column stuff like we had in Sheets. However, LUT_Status and LUT_Size I intend to eventually use-use in the UI. (LUT_Status is cool, check it out)
  4. The Meta tables are total wishlist.
  5. column and table descriptions are a mixed bag, but i have a few populated. feel free to populate the others for some practice. i think most of this could replace or at least supplement our Data Schema sheet.

which looks intense—nice work.

it is but it really just comes down to 1-2 trickier concepts: Link/Lookup fields. but when you compare the "tricky" here to the nasty formulas, fragile validation, and named ranges i had in Sheets, it's a cakewalk. well it is NOW anyway, but 24-hours-ago Jason might disagree. ;)

more this weekend.

@rperlin-ela
Copy link
Collaborator

if we move the 2014-2018 over to their own Airtable bases, I think we will be Sheets-free at that point. still moving parts of course, just fewer parts and more movement. ⚙️

Let's see what Matt says, but having it all in AirTable seems like a good idea.

if those changes include new records, deleted records, or changed records with any of the MB fields i mentioned above, then yes. alternatively if the script doesn't pan out (new territory there), Airtable allows a CSV download just like Sheets. And actually if you're fine with that for now, i can just make you an MB-only view in Airtable like i started here.

Cool, so this is the new Final Output

Honestly if it's just those fields then i think an upload script is way overkill. i proposed it when we were doing the everything-in-MB approach, so i'm not sure it's worth it now. Airtable provides a nice little CSV based on that view i created: ready-for-mb-csv.zip

i have a LOT on my plate to adapt the code to Airtable and polish up the Airtable base, so i would prefer to focus on that. sorry for any time you lost to Node, but it's probably not as much time as i lost to brainstorming the script, researching the easiest way for you to install it, and then documenting the steps. ;)

No worries, the old CSV process is pretty chill. Script a nice bonus but not crucial

For better or worse with that "real-time" part, but that's always been the case using the same data for prod and deploys. It would be smart to have two sets of synced Airtable stuff, one for each of those two env's, but we've gotten this far with the current single-environment (aka same data used by both prod and deploys) and i'm pretty careful about avoiding breaking changes, and being quick about "pulling the PR merge rug real quick" when it does affect prod.

all this and really anything i'm doing w/Airtable is super-duper out of this scope of work, so i'm going to backpedal away from the dual-env or fancy-script path before i've gone too far.

Yep, understood. I get the real-time perils here too.

let me clean up Airtable a bit, it's a hodgepodge of WIP and ready right now, and a total of 0% of the code has been adapted to it so far.

Ok, I may peruse but will wait for the bat signal from you, and maybe even an Airtable 101 before going at it the way I'm used to in Sheets

if you do any Airtable perusing (which I would encourage), the main things are:

  1. The top-level Explore tables which were formerly LUTs have been renamed to their respective column names (e.g. Country instead of LUT_Countries or whatever). This way I can hit them in the API with the same text coming from our own URL routes.
  2. There are a LOT of fields (30+ in Data and Language) and a lot of hidden fields. It's mostly because any time there's a "Link" field between two tables, it creates a new field, plus any of the related ones, plus potentially some formula/processing fields (e.g. to keep only unique values). It's a total mess for editing, but it's very easy to toggle and search for columns (and search for tables for that matter), and it doesn't affect my end in the code because i can control which fields i hit.
  3. Most of the LUTs are just that, single-column stuff like we had in Sheets. However, LUT_Status and LUT_Size I intend to eventually use-use in the UI. (LUT_Status is cool, check it out)
  4. The Meta tables are total wishlist.
  5. column and table descriptions are a mixed bag, but i have a few populated. feel free to populate the others for some practice. i think most of this could replace or at least supplement our Data Schema sheet.

Useful, thanks, yes very cool to see all this (CMS-ish?) stuff all laid out here. Look forward to taking a spin through here with you

@abettermap
Copy link
Contributor Author

Cool, so this is the new Final Output

In the sense that it's what you upload to MB, yes. But not in the comprehensive sense in that it won't be what gets hit in the UI/API.

@abettermap abettermap added type: ✨ enhancement✨ New feature, improvement, or functionality priority: 0 (wishlist) Some day! focus: 🏧 automation🏧 e.g. CI/CD, GitHub Actions, data conversion, deployments. effort: 2 🥈 (med)🥈 Average amount of work labels Jan 7, 2021
@rperlin-ela
Copy link
Collaborator

Re-upping that either a script or a CSV workflow in MB would be desirable. Current workflow relies on free CSV > GeoJSON converter Ogre, which is giving an error, now using a different one instead (http://geojson.io) but would be good to reduce reliance

@abettermap
Copy link
Contributor Author

Yeah it's a pretty choppy workflow. I thought the geojson ogre steps were only for using MB Datasets though, and we got past that? I could be totally wrong on the ogre part, but you could give it a try using the good old fashioned ogre-free MB-direct route. If it craps out, back to ogre I guess.

@rperlin-ela
Copy link
Collaborator

Pretty sure that CSV > Tileset is "successful" but doesn't fully compute, so it seems like it's needing a JSON for whatever reason. Point here is that Ogre doesn't work either, at least how it used to, which is why I'm documenting the switch to geojson.io here, and the dependency on these tools.

@abettermap
Copy link
Contributor Author

Ok looks like geojson.io strips out the Latitude/Longitude columns and only uses them for geometry. I tried a hack of adding two more columns in AT (yeah yeah I know, abyss haha), and it "works" but unfortunately geojson.io converts them to text. Gahhh

Anyhoo this tool seems to work and even has some other options which may be useful to you. I believe you can keep all the defaults as-is, and just continue using the CSV downloaded from the "For Map" AT view (to which I added two new columns lat and lon).

Re: OGRE: looks like there was a change to the platform 6 days ago which appears to be causing the upload error (someone else also reported it. It shouldn't matter if we use ogre vs. another tool, so your choice on that. If the ogre issue gets fixed then we can go back to that and remove the lat/lon columns, otherwise use the other tool in the meantime.

@rperlin-ela
Copy link
Collaborator

rperlin-ela commented Feb 1, 2021 via email

@abettermap
Copy link
Contributor Author

Odd! If the code hasn't changed and AT schema hasn't changed, it's gotta be something w/the conversion and upload process, right?

Can you confirm that the "View results in map" was indeed working like a week ago though? I'm pretty sure it was, just making sure we're not chasing the wrong hunch.

@abettermap
Copy link
Contributor Author

I can confirm that it's NOT working in the deploy from January 6 but that's not very useful since it's hitting the current MB/AT data. In addition to AT snapshots, we'd really benefit from MB backups as well since there's no "Undo" in scenarios like we're in now.

If I were you I would create a folder in Drive, and each time you upload/replace MB tileset:

  1. Upload the geojson or CSV to the folder
  2. Rename it to something with a date prefix or suffix

Then we at least have something to go back to for the spatial. Obviously doesn't help us right now, but it might for future troubleshooting/lifesaving.

@rperlin-ela
Copy link
Collaborator

rperlin-ela commented Feb 1, 2021 via email

@rperlin-ela
Copy link
Collaborator

rperlin-ela commented Feb 1, 2021 via email

@abettermap
Copy link
Contributor Author

Yeesh, so many moving parts, eh?

The only things I can confirm on my end are that the code has not been changed since January 22 and that ogre is now back up and running thanks to a super-fast response by those guys! So now you can upload to there again, but I'm not sure it will fix our problem as the Lat/Lon are coming through as strings now (and same even from the previous ogre version, which i cloned and tried locally):

image

Can you upload a geojson to this thread from the most recent time you know it was working? I just want to see if it's strings or numbers.

Problem could be connected to conversion and upload because that changed (on Jan. 29) and I might not have noticed the issue at first.

What do you mean by changed? Something besides the data itself?

@abettermap
Copy link
Contributor Author

I'm not ruling out AT as the cause either, but the id values are unique and Lat/Lon looks to be a number, and there are no empty rows.

I imagine there is a hack to make this all work in code but that's not a good solution since the problem did not originate in code, and not getting to the root of the AT/MB conversion cause might bite us sooner than later.

@abettermap
Copy link
Contributor Author

One big thing I am noticing is that id (which is a number in AT) becomes a string in MB when uploaded as GeoJSON, but remains a number when uploaded via the downloaded CSV. I recall having problems with this before, but if my code is correct (and I have to say it is since it hasn't been changed in over a week) then it's expecting a string.

The problem w/uploading the GeoJSON (from ogre or whatever), however, is that it's bringing Latitude/Longitude in as a string as well. It's a number and always should be, so that's kind of a deal-breaker.

What we need in the future is more control over the MB layers. Their Studio does its best to make decisions automatically, but for things like "id should be a string", that doesn't seem like we have much control over that and I'm not sure what changed in your workflow or data to make it different.

@rperlin-ela
Copy link
Collaborator

rperlin-ela commented Feb 1, 2021 via email

@abettermap
Copy link
Contributor Author

not seeing geojsons, you can just email them to me. or better yet, make the Drive folder and dump them in there.

@rperlin-ela
Copy link
Collaborator

rperlin-ela commented Feb 1, 2021 via email

@abettermap
Copy link
Contributor Author

Thanks, I'll take a look tomorrow if I have a chance.

@rperlin-ela
Copy link
Collaborator

rperlin-ela commented Feb 2, 2021 via email

@abettermap
Copy link
Contributor Author

Whew!

Man I don't know if that was it or not. I thought I tried both the old ogre locally and the new fixed one on the web without any luck. My only thought is that MB is somehow hanging onto a cached version or something. I tried an incognito window and all that, but maybe it just needed more time?? I have no idea other than that but I'm glad it worked.

Heard nyc is getting dumped on, stay warm and dry! ☃️

@rperlin-ela
Copy link
Collaborator

Note to future selves: Ogre not working for this in recent months, getting error in Mapbox when replacing tileset. Using https://mygeodata.cloud instead, which seems to work.

@abettermap
Copy link
Contributor Author

Gahh what a hassle, sorry you have to go through all those steps. Glad there's a (new) workaround though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort: 2 🥈 (med)🥈 Average amount of work focus: 🏧 automation🏧 e.g. CI/CD, GitHub Actions, data conversion, deployments. priority: 0 (wishlist) Some day! type: ✨ enhancement✨ New feature, improvement, or functionality
Projects
None yet
Development

No branches or pull requests

2 participants