Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process data at a finer spatial granularity #58

Open
GaelVaroquaux opened this issue Mar 18, 2020 · 10 comments
Open

Process data at a finer spatial granularity #58

GaelVaroquaux opened this issue Mar 18, 2020 · 10 comments
Labels
High priority This issue should be prioritized plotly Requires skills with plotly express PyData Requires PyData skills

Comments

@GaelVaroquaux
Copy link
Contributor

The data from John Hopkins comes at the level of regions / province. We should ideally build the map at this level. Forecasting at this level would also be interesting, provided that there are enough cases (forecasting from few cases is unreliable).

This enhancement will require some work, but it seems a worthwhile addition to the site.

@GaelVaroquaux GaelVaroquaux added the PyData Requires PyData skills label Mar 18, 2020
@GaelVaroquaux
Copy link
Contributor Author

GaelVaroquaux commented Mar 20, 2020

I think that the challenge is plotting on the map: we need to get the shape of each region / province. Maybe it's a geojson?

Here is a discussion on US states in a world map:
https://community.plot.ly/t/state-boundaries-on-a-world-map-projection/11698/4

Based on the following documentation, the only predefined geometries are the countries and the US states:
https://plot.ly/python/choropleth-maps/#using-builtin-country-and-state-geometries

@GaelVaroquaux GaelVaroquaux added the plotly Requires skills with plotly express label Mar 20, 2020
@emmanuelle
Copy link
Contributor

Currently taking a look at http://www.naturalearthdata.com/downloads/110m-cultural-vectors/
@jorisvandenbossche do you think it's a good resource or would you rather recommend another resource ? (sorry for the ping!)

@emmanuelle
Copy link
Contributor

if we want a quick solution, what could be done would be to use the Lat / Lon info of the dataset to plot a scatter plot at each lat / lon tuple. no need for shape files there. Of course having the shapes is nicer but it's also more data for the whole page

@jorisvandenbossche
Copy link

Natural Earth indeed has States/Provinces, but the question will still be if that matches the regions as provided in the data. Is there an example of the data?

@emmanuelle
Copy link
Contributor

@jorisvandenbossche
Copy link

Thanks for that link. So https://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-admin-1-states-provinces/ has states and provinces shapes. I can take a look tomorrow if it is relatively straightforward to match those.
But eg the COVID data for the US even come per county, not per state (although it should be easy to aggregate those per state)

@emmanuelle
Copy link
Contributor

Thanks for taking a look. The Johns Hopkins dataset (which we are using at the moment) only has province / state information for a handful of countries (it might change in the future)

>>> countries = df['Country/Region']
>>> countries.value_counts()[:50]                                                                                               
US                     247
China                   33
Canada                  12
France                   9
Australia                9
United Kingdom           7
Netherlands              4
Denmark                  3
Japan                    1

So for now this correspondance must be checked for 8 countries. I'll try it with the US states (county-level information is great but state-level should be fine for now), since plotly's choropleth trace already knows the geometry of US states.

@emmanuelle
Copy link
Contributor

Related to this: #79 . We can assume that county-based data are incomplete or not reliable, so let's not use them and focus on state-level data for the US.

@emmanuelle
Copy link
Contributor

Also see CSSEGISandData/COVID-19#1250

@emmanuelle
Copy link
Contributor

In fact, regions info are only useful for Canada, Australia and China. For the other countries, regions correspond to overseas territories.

@GaelVaroquaux GaelVaroquaux added the High priority This issue should be prioritized label Apr 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
High priority This issue should be prioritized plotly Requires skills with plotly express PyData Requires PyData skills
Projects
None yet
Development

No branches or pull requests

3 participants