You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to add a few signals all from the same source. The data is in Tables published weekly by the Public Health Agency of Canada. That link is for the most recent report.
I'm currently working with a student to collect all the previous versions, but it would be great to serve all of this data along with new information as it is added. My goal would be to have this ready to go by the beginning of next year's season, which is usually near the end of August.
Data details
Raw data is available in html tables at the link above. I can try to contact the agency to see if there is an easier and more timely way to access it.
There are a few different geographic strata. Some signals are available at the "lab" level, which is currently about 30 different labs across the country (names and province of location are clear). Some is Provincial (usually 10-13 depending if the 3 territories are combined). And some is aggregated to a few regions (Atlantic, Quebec, Ontario, Prairies, BC, Territories).
There are multiple signals, each with a clear time tag. The issue is also part of the report.
New issues revise previous values for some signals. Others only report the most recent observation with no way to track revision behaviour. (More aggregate data is more likely to include revisions).
Additional context
Some other questions @nmdefries suggested I address:
Does the data have revisions? If there are revisions, how often, how far back, on which signals, etc
Yes, but only some of the signals. Revisions go back as far as the beginning of the current season.
What are the limitations of the data? e.g. lack of geo coverage, any censoring, based on a biased sample/not representative
Some of all of this. The most disaggregated data is for a specific (potentially biased) collection of labs. There's also internal processing with private data that I'm not aware of. Some geographies have greater coverage than others.
Any processing that the source does. e.g. normalization, smoothing, censoring
Not really.
Whether you foresee us needing to derive any signals or we can report as-is
I think "as is" mainly. It may be helpful to convert some percentages to raw counts, but this isn't entirely necessary (as the denominator is also present in the source).
What geo data we need and where to get it
Probably just a few crosswalks with Lab -> Province -> Region. I can help with this.
On my end:
The data is public, but not otherwise served. I should check any data use information to see if there are potential issues. I should also check if they're willing to give us a more useful format/source than scraping the website when it get's released (and being subject to unknown decisions to change the format).
The text was updated successfully, but these errors were encountered:
I was talking with Ron about this, and the partial revision behavior came up. Do you think we should start archiving now, before the flu season ends, for those signals which don't include revision behavior? Or is your student effectively covering that aspect already?
I think it's actually easier than that, but maybe I'm not being very clear. I think there are 2 cases
Some signals report weekly. And each weekly issue also contains revisions to all past time values for the season. But the previous reports all remain online and accessible. So we can scrape it all at once (now or later), then during the season, just grab the most recent issue which will contain multiple time values for each location.
Some signals report weekly. But they don't include any previous time values. So for any issue, there is always 1 unique time value. Past reports remain online, but I'm guessing they are never updated. So there is 1 issue per time value and no way to track revisions at all.
I would like to add a few signals all from the same source. The data is in Tables published weekly by the Public Health Agency of Canada. That link is for the most recent report.
I'm currently working with a student to collect all the previous versions, but it would be great to serve all of this data along with new information as it is added. My goal would be to have this ready to go by the beginning of next year's season, which is usually near the end of August.
Data details
Additional context
Some other questions @nmdefries suggested I address:
Yes, but only some of the signals. Revisions go back as far as the beginning of the current season.
Some of all of this. The most disaggregated data is for a specific (potentially biased) collection of labs. There's also internal processing with private data that I'm not aware of. Some geographies have greater coverage than others.
Not really.
I think "as is" mainly. It may be helpful to convert some percentages to raw counts, but this isn't entirely necessary (as the denominator is also present in the source).
Probably just a few crosswalks with Lab -> Province -> Region. I can help with this.
On my end:
The data is public, but not otherwise served. I should check any data use information to see if there are potential issues. I should also check if they're willing to give us a more useful format/source than scraping the website when it get's released (and being subject to unknown decisions to change the format).
The text was updated successfully, but these errors were encountered: