Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canadian FluWatch Data #1944

Open
dajmcdon opened this issue Feb 10, 2024 · 2 comments
Open

Canadian FluWatch Data #1944

dajmcdon opened this issue Feb 10, 2024 · 2 comments
Labels
API addition New signals

Comments

@dajmcdon
Copy link

dajmcdon commented Feb 10, 2024

I would like to add a few signals all from the same source. The data is in Tables published weekly by the Public Health Agency of Canada. That link is for the most recent report.

I'm currently working with a student to collect all the previous versions, but it would be great to serve all of this data along with new information as it is added. My goal would be to have this ready to go by the beginning of next year's season, which is usually near the end of August.

Data details

  • Raw data is available in html tables at the link above. I can try to contact the agency to see if there is an easier and more timely way to access it.
  • There are a few different geographic strata. Some signals are available at the "lab" level, which is currently about 30 different labs across the country (names and province of location are clear). Some is Provincial (usually 10-13 depending if the 3 territories are combined). And some is aggregated to a few regions (Atlantic, Quebec, Ontario, Prairies, BC, Territories).
  • There are multiple signals, each with a clear time tag. The issue is also part of the report.
  • New issues revise previous values for some signals. Others only report the most recent observation with no way to track revision behaviour. (More aggregate data is more likely to include revisions).

Additional context

Some other questions @nmdefries suggested I address:

Does the data have revisions? If there are revisions, how often, how far back, on which signals, etc

Yes, but only some of the signals. Revisions go back as far as the beginning of the current season.

What are the limitations of the data? e.g. lack of geo coverage, any censoring, based on a biased sample/not representative

Some of all of this. The most disaggregated data is for a specific (potentially biased) collection of labs. There's also internal processing with private data that I'm not aware of. Some geographies have greater coverage than others.

Any processing that the source does. e.g. normalization, smoothing, censoring

Not really.

Whether you foresee us needing to derive any signals or we can report as-is

I think "as is" mainly. It may be helpful to convert some percentages to raw counts, but this isn't entirely necessary (as the denominator is also present in the source).

What geo data we need and where to get it

Probably just a few crosswalks with Lab -> Province -> Region. I can help with this.

On my end:

The data is public, but not otherwise served. I should check any data use information to see if there are potential issues. I should also check if they're willing to give us a more useful format/source than scraping the website when it get's released (and being subject to unknown decisions to change the format).

@dajmcdon dajmcdon added the API addition New signals label Feb 10, 2024
@dsweber2
Copy link
Contributor

I was talking with Ron about this, and the partial revision behavior came up. Do you think we should start archiving now, before the flu season ends, for those signals which don't include revision behavior? Or is your student effectively covering that aspect already?

@dajmcdon
Copy link
Author

I think it's actually easier than that, but maybe I'm not being very clear. I think there are 2 cases

  1. Some signals report weekly. And each weekly issue also contains revisions to all past time values for the season. But the previous reports all remain online and accessible. So we can scrape it all at once (now or later), then during the season, just grab the most recent issue which will contain multiple time values for each location.
  2. Some signals report weekly. But they don't include any previous time values. So for any issue, there is always 1 unique time value. Past reports remain online, but I'm guessing they are never updated. So there is 1 issue per time value and no way to track revisions at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API addition New signals
Projects
None yet
Development

No branches or pull requests

2 participants