Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Estimate non-zero FlowAmounts when below source reporting threshold #302

Closed
matthewlchambers opened this issue Dec 6, 2022 · 6 comments
Closed
Labels
enhancement New feature or request flowbyactivity flow by activity files

Comments

@matthewlchambers
Copy link
Collaborator

The Aircraft row is missing from a newly generated EPA_GHGI_T_3_14 FBA (year is 2016, if that matters).

@bl-young bl-young added the feature branch Issues related to feature branches not impacting master label Dec 6, 2022
@bl-young
Copy link
Collaborator

bl-young commented Dec 6, 2022

Yes there are a few lingering issues in a handful of tables that i'm working through. Feel free to document any more that you see here.

@matthewlchambers
Copy link
Collaborator Author

I see that for 2016, that row is identified as being less than 0.05 MMT (specific value not given). Is that why it's excluded? If so, maybe there's a better way of handling that situation?

@bl-young
Copy link
Collaborator

bl-young commented Dec 6, 2022

I see that for 2016, that row is identified as being less than 0.05 MMT (specific value not given). Is that why it's excluded? If so, maybe there's a better way of handling that situation?

Yep that would be it. I do think your update to add a new field for that makes sense. In the current GHGI branch its not yet implemented.

@WesIngwersen WesIngwersen changed the title Missing row from EPA_GHGI_T_3_14 FBA Estimate non-zero FlowAmounts when below source reporting threshold Feb 9, 2023
@WesIngwersen WesIngwersen added enhancement New feature or request flowbyactivity flow by activity files and removed feature branch Issues related to feature branches not impacting master labels Feb 9, 2023
@WesIngwersen
Copy link
Collaborator

Renamed this issue to broaden the scope to the issue at heart of this example.. when a source provided no value but it is also not given as zero, such as a '+' or '-' symbol. This could be, for example, when it's less than the significant figures provided.

@bl-young
Copy link
Collaborator

This is really something that needs to be implemented for each specific FBA. We have a few examples now where this is done:

EIA MECS:

df = df.assign(
FlowAmount=df.FlowAmount.mask(df.FlowAmount.str.isnumeric() == False,
np.nan),
Suppressed=df.FlowAmount.where(df.FlowAmount.str.isnumeric() == False,
np.nan),
Spread=df.Spread.mask(df.Spread.str.isnumeric() == False, np.nan)
)

Census SAS:

# set suppressed values to 0 but mark as suppressed
# otherwise set non-numeric to nan
df = (df.assign(
Suppressed = np.where(df.FlowAmount.str.strip().isin(["S", "Z", "D"]),
df.FlowAmount.str.strip(),
np.nan),
FlowAmount = np.where(df.FlowAmount.str.strip().isin(["S", "Z", "D"]),
0,
df.FlowAmount)))
df = (df.assign(
Suppressed = np.where(df.FlowAmount.str.endswith('(s)') == True,
'(s)',
df.Suppressed),
FlowAmount = np.where(df.FlowAmount.str.endswith('(s)') == True,
df.FlowAmount.str.replace(',','').str[:-3],
df.FlowAmount),
))

GHGI:

# set suppressed values to 0 but mark as suppressed
# otherwise set non-numeric to nan
try:
df = (df.assign(
Suppressed = np.where(df.FlowAmount.str.strip() == "+", "+",
np.nan),
FlowAmount = pd.Series(
np.where(df.FlowAmount.str.strip() == "+", 0,
df.FlowAmount.str.replace(',',''))))
)
df = (df.assign(
FlowAmount = np.where(pd.to_numeric(
df.FlowAmount, errors='coerce').isnull(),
np.nan, pd.to_numeric(
df.FlowAmount, errors='coerce')))
.dropna(subset='FlowAmount')
)
except AttributeError:
# if no string in FlowAmount, then proceed
df = df.dropna(subset='FlowAmount')

The approach to handling of the suppressed data is then indicated in a FBS, for example this function for MECS:

def estimate_suppressed_mecs_energy(
fba: FlowByActivity,
**kwargs
) -> FlowByActivity:
'''
Rough first pass at an estimation method, for testing purposes. This
will drop rows with 'D' or 'Q' values, on the grounds that as far as I can
tell we don't have any more information for them than we do for any
industry without its own line item in the MECS anyway. '*' is for value
less than 0.5 Trillion Btu and will be assumed to be 0.25 Trillion Btu
'''
if 'Suppressed' not in fba.columns:
log.warning('The current MECS dataframe does not contain data '
'on estimation method and so suppressed data will '
'not be assessed.')
return fba
dropped = fba.query('Suppressed not in ["D", "Q"]')
unsuppressed = dropped.assign(
FlowAmount=dropped.FlowAmount.mask(dropped.Suppressed == '*', 0.25)
)
return unsuppressed.drop(columns='Suppressed')

@bl-young
Copy link
Collaborator

Going to close this issue as resolved knowing that this can be added to FBAs as they are updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request flowbyactivity flow by activity files
Projects
None yet
Development

No branches or pull requests

3 participants