Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cumbersome and slow build process #66

Open
evansiroky opened this issue Dec 14, 2021 · 2 comments
Open

Cumbersome and slow build process #66

evansiroky opened this issue Dec 14, 2021 · 2 comments
Labels

Comments

@evansiroky
Copy link
Member

evansiroky commented Dec 14, 2021

The build process for this website is problematic from a development perspective:

  • it takes multiple steps to build the development site
  • there are 3 environments that need to be setup: (two python virtual environments and one nodejs install)
  • it takes over an hour to build everything from scratch
  • it takes over 10 minutes and over 1GB of data (which will increase every month!) to upload this site to the cloud.

The whole project could use a major refactor to speed everything up. Ideally, there should be a way to extract data only as needed to build and preview the site locally after making changes in a matter of seconds if not using a live-reload webserver. Furthermore, the choice of using a Jupyter Notebook may not be appropriate since it is merely an intermediary used to eventually generate some images and JSON data which could be generated by other means.

@holly-g
Copy link
Contributor

holly-g commented Dec 16, 2021

@themightychris how about we pair next week to run through the current build process and identify opportunities to optimize the pipeline ?

@machow
Copy link
Contributor

machow commented Dec 17, 2021

Hey--I have two quick thoughts

You can already build the preview site without a full rebuild

  • you can build for only a single month, if you don't follow the pattern of changing the names / format that data is stored in.
  • 1 GB is not a lot on the cloud
  • If the plots are generated from data on the frontend, or we are more selective about uploading only the data (rather than the full state, including papermill notebooks). I think it might reduce a lot of the upload size. edit: (I think we could put @ryon on this to tackle front-end or svg plots fairly independently..!)

Making the builds very fast with warehouse table views

If the data for reports is simply one large table per metric, that can be filtered on date and feed, then you'll only need to pull those tables once per metric (and do quick local filters for date and feed rather than SQL queries).

90% of time is spent on http requests, rather than computation. If the metrics were in tables, rather than computed in SQL per month per agency, then it could all probably be pulled in e.g. a github action and the site data regenerated across all months very quickly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants