Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plot observations per hydrological year #214

Open
HMEUW opened this issue May 2, 2024 · 5 comments
Open

Plot observations per hydrological year #214

HMEUW opened this issue May 2, 2024 · 5 comments
Assignees

Comments

@HMEUW
Copy link
Collaborator

HMEUW commented May 2, 2024

I need plots of my long term observations per year. So x-axis is one hydrological year (starting 1 April, or user defined date); and each observation year has a different color.

First question: Is it okay to include this in HydroPandas?

Secondly if yes, I thought of two options to do this. Any advise on this?

  1. Add new column to obs-collection; that has date/time with a dummy year e.g. 1900 (for period 1 April - 31 Dec); and dummy year 1901 (for period 1 Jan - 31 March). Function can use pandas/matplotlib x-axis formatting power for plotting times and dates. Only remove the year from the x-axis labels. This has my preference
  2. Add new column to obs-collection; that has the date number since 1 April. Function has to change the x-axis labels after plotting from datenumbers to usefull dates or months. Then we cannot use the pandas/matplotlib power in this.
@MattBrst
Copy link
Contributor

MattBrst commented May 2, 2024

I previously used something similar to option 1, which may serve as inspiration. It basically calculates the julian date since 1st of January of the current year, but should be possible to be used with any other starting date as well. Works with higher frequency than daily data.

df['year'] = df['date'].dt.year
for i, row in df.iterrows():
    df.loc[i, 'doy']= row['date'].to_julian_date() - row['date'].replace(month=1, day=1, hour=0, minute=0, second=0).to_julian_date()

@dbrakenhoff
Copy link
Collaborator

dbrakenhoff commented May 2, 2024

I'm all for more pretty plots, and this sounds like useful plot to make.

As for the implementation, I would keep it a bit simpler using groupby:

obs  # my obs
gr = obs[column].groupby(by=obs.index.year)
fig, ax = plt.subplots()
for year, group in gr:
    ax.plot(group.index.dayofyear, group.values, label=year)

# some code to set to nicely set the date labels using DateFormatter or something along those lines
ax.set_xticklabels(...)

There's always the question of how to handle leap-years, but that's just a choice you have to make.

EDIT: the code above doesnt work for higher frequency data than daily. In that case you have to compute the index another way (not tested but an idea off the top of my head): tidx = ref_date + (group.index - group.index[0].round("YS"))

@HMEUW
Copy link
Collaborator Author

HMEUW commented May 3, 2024

I mixed up the suggestions of @MattBrst and @dbrakenhoff. I have working code now. Any suggestions?

import matplotlib.dates as mdates

# start of the hydrological year, here choose 1 November
month0 = 11
day0 = 1

df = oc_plot.iloc[0].obs

# first proces the first calendar year in the requested hydrological year
# for simplicity assign to all data, repair for the second year later

# create column with legend label
df['plot_year'] = df.index.year
# create x-values for plotting
# TO DO: gives PerformanceWarning
df['plot_x'] = df.index + pd.offsets.DateOffset(year=1900)

# overwrite assigned values for dates before month0 and day0

for year in range(df.index.year.min(), df.index.year.max()+1):
    # these belong to the previous hydrological year, so change legend label
    df.loc[
        (df.index >= pd.Timestamp(year, 1, 1)) &
        (df.index < pd.Timestamp(year, month0, day0)), 'plot_year'] = year-1
    # assign year 1900+1 in plotting index
    df.loc[
        (df.index >= pd.Timestamp(year, 1, 1)) &
        (df.index < pd.Timestamp(year, month0, day0)), 'plot_x'] += pd.offsets.DateOffset(year=1901)

# plotting
gr = df.groupby(by=df.plot_year)
fig, ax = plt.subplots()
for plot_year, group in gr:
    ax.plot(group.plot_x, group.stand_m_tov_nap, label=plot_year)

ax.legend()
ax.grid()
ax.set_xlim([pd.Timestamp(1900, month0, day0),  pd.Timestamp(1901, month0, day0)])

test

You are looking to observed groundwater levels in a dike. Steep rise since 1 November is interesting. Apart from minor change between annual maximum water levels despite signifcant difference in rainfall.

@martinvonk
Copy link
Collaborator

martinvonk commented May 7, 2024

In SPEI I have a function that does something similar:

https://github.com/martinvonk/SPEI/blob/a422933cb6b98605e143aac846c2374af390afb2/src/spei/utils.py#L78-L92

from pandas import Grouper
from pandas import __version__ as pd_version
from pandas import concat, to_datetime

def group_yearly_df(series: Series) -> DataFrame:
    """Group series in a DataFrame with date (in the year 2000) as index and
    year as columns.
    """
    strfstr: str = "%m-%d %H:%M:%S"
    grs = {}
    freq = "YE" if pd_version >= "2.2.0" else "Y"
    for year_timestamp, gry in series.groupby(Grouper(freq=freq)):
        gry.index = to_datetime(
            "2000-" + gry.index.strftime(strfstr), format="%Y-" + strfstr
        )
        year = getattr(year_timestamp, "year")  # type: str
        grs[year] = gry
    return concat(grs, axis=1)

@HMEUW
Copy link
Collaborator Author

HMEUW commented May 7, 2024

Thanks for sharing the snippet. You have some more Python-ic code than my for-loop.

@HMEUW HMEUW self-assigned this May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

4 participants