Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sensorization #568

Open
wants to merge 20 commits into
base: dv-package
Choose a base branch
from
Open

Sensorization #568

wants to merge 20 commits into from

Conversation

rumackaaron
Copy link
Contributor

@rumackaaron rumackaaron commented Nov 20, 2020

Performs sensorization for DV signal.

  • For each time_value t:
    • Create a single linear mapping (f) from a global case rate to global DV, fit globally using the previous [j, k] days
    • For each location i:
      • Create linear mapping (g_i) from DV to cases, fit using just location i and the previous [j, k] days
      • At time t and for location i, report f(g_i(DV(t))). In other words, pass DV(t) → g_i → f

EDIT: After Addison's DAP, Ryan has decided that it would be better to perform sensorization statically, not using a sliding window for training. So the algorithm has been modified to:

  • Create a single linear mapping (f) from a global case rate to global DV, fit using the entire history
  • For each location i:
    • Create linear mapping (g_i) from DV to cases, fit using just location i, using the entire history
    • At time t and for location i, report f(g_i(DV(t))). In other words, pass DV(t) → g_i → f

@rumackaaron
Copy link
Contributor Author

Sorry, I meant to merge this into the dv-package branch. Is that still possible or do I need to create a new PR?

@rumackaaron rumackaaron changed the base branch from main to dv-package November 20, 2020 18:51
@rumackaaron
Copy link
Contributor Author

Tagging @krivard

@krivard
Copy link
Contributor

krivard commented Nov 30, 2020

Is this still under consideration, or are we backing off of sensorization entirely?

@rumackaaron
Copy link
Contributor Author

We are trying to tweak the method of both the sensorization map and the inverse map, but I don't think we're backing off entirely.

@krivard
Copy link
Contributor

krivard commented Dec 2, 2020

okay -- maybe convert to draft while you're tweaking, then it'll be easier to know when someone should review it

@rumackaaron rumackaaron marked this pull request as draft December 2, 2020 21:56
@rumackaaron rumackaaron marked this pull request as ready for review January 12, 2021 15:31
Comment on lines +24 to +27
DATE_COL = "ServiceDate" #"servicedate"
GEO_COL = "PatCountyFIPS" #"patCountyFIPS"
AGE_COL = "PatAgeGroup" #"patAgeGroup"
HRR_COLS = ["Pat HRR Name", "Pat HRR ID"]#["patHRRname", "patHRRid"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are these comments for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point, HSP changed the column names of the files they were sending us. I was testing the code on a recent drop (with lowercase column names). The code originally had the uppercase names, so I changed them back to pass the tests and to merge the PR.

@@ -140,7 +147,7 @@ def update_sensor(
params = Weekday.get_params(data) if weekday else None

# handle explicitly if we need to use Jeffreys estimate for binomial proportions
jeffreys = True if se else False
jeffreys = se
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this var needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's used in sensor.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i meant if it's going to be jeffreys = se, why not just use the sevariable and not create/assign it to thejeffreys` varrable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's fair. I'll change it

@rumackaaron
Copy link
Contributor Author

From Ryan:

Update: my current view is that the best way to sensorize at time t is to use all available history for the sensor regression model. Then use this fitted model to define a new sensor at time t, and also redefine the sensor at all times 1,...,t-1. This will give us the best of both worlds: improved comparisons across geos, and stable comparisons across time. It can be accomplished using the "as of" parameter in the COVIDcast API, as I explain in my comment here. See also Addison's DAP.

Suggestion: Aaron first implements this "offline" to make sure that it truly works as expected. Then Aaron + engineering work to make this possible when stored a sensor in the API, through the "as of" parameter. Doing this efficiently, without using up a ton of memory going forward, requires us to do some dynamic but simple calculations on the API side.

To save memory, Ryan suggests we store the indicator values for the non-sensorized signal (indexed by time, location, and issue) and the location-specific and time-invariant coefficients (indexed by location and issue) on the server, instead of storing the non-sensorized signal as well as the sensorized signal.

This is not currently supported, as I output the sensorized signal just like any other signal. Thoughts on whether this is worth implementing on the API side and on the indicator side?

fips_pop = pd.read_csv("%s/fips_pop.csv" % (geo_folder),\
dtype={"fips":str,"pop":int})
if geo.lower() == "state":
fips_state = pd.read_csv("%s/fips_state_table.csv" % (geo_folder),\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason geomapper util isnt used here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@mariajahja
Copy link
Member

This is not currently supported, as I output the sensorized signal just like any other signal. Thoughts on whether this is worth implementing on the API side and on the indicator side?

Just to make sure I understand -- the sensorized signal will be in addition to the existing indicator, or will it replace it? I understand it may be less confusing on the map to only show the sensorized version, but we'd still like the API to provide the original indicator.

@krivard
Copy link
Contributor

krivard commented Jan 12, 2021

Thoughts on whether this is worth implementing on the API side and on the indicator side?

This approach sounds similar to/compatible with cmu-delphi/delphi-epidata#239. In general I'm in favor of such an effort, but it's a much bigger lift than we can complete by the end of the month. Probably doable as a Q1 goal though. Who else should weigh in?

Comment on lines +286 to +304
geomapper = GeoMapper()
fips_pop = pd.DataFrame({"fips":sorted(list(geomapper.get_geo_values("fips")))})
fips_pop = fips_pop[fips_pop.fips.str.slice(start=2) != "000"]
fips_pop = geomapper.add_population_column(fips_pop,"fips",geocode_col="fips")
if geo.lower() == "state":
geo_weights = geomapper.replace_geocode(
fips_pop,"fips","state_id",from_col="fips",new_col="geo",date_col=None
)
elif geo.lower() == "hrr":
geo_weights = geomapper.replace_geocode(
fips_pop,"fips","hrr",from_col="fips",new_col="geo",date_col=None
)
elif geo.lower() == "msa":
geo_weights = geomapper.replace_geocode(
fips_pop,"fips","msa",from_col="fips",new_col="geo",date_col=None
)
elif geo.lower() == "county":
geo_weights = fips_pop.rename(columns={"fips":"geo"})
geo_weights = geo_weights.rename(columns={"population":"weight"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels like something that could eventually live in the geomapper util to get populations for everything, but for now would recommend moving this into its own function that returns the weight DF.



@staticmethod
def sensorize(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot going on here -- I'd split it up into a few different methods that get called depending on the various conditions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants