Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse metar from pandas dataframe into another dataframe #3476

Open
jgoriasilva opened this issue Apr 11, 2024 · 3 comments
Open

Parse metar from pandas dataframe into another dataframe #3476

jgoriasilva opened this issue Apr 11, 2024 · 3 comments
Labels
Type: Feature New functionality

Comments

@jgoriasilva
Copy link

What should we add?

I have a dataframe in which there is a column with strings of METAR reports. Currently, if I use the parse_metar_to_dataframe, which only accepts a string as an input, it will generate one dataframe for each string of my column, resulting in a series of dataframes (if I use pandas.series.apply for example).
It would be much easier to use the parser if it could accept a pandas Series and return a single DataFrame with the same columns as currently it does currently but where each row is a parsed METAR, instead of one dataframe with one row only for each parsed string.
I might be missing something with the usage, but as I understand there is no way to do it without creating unnecessary overhead with the

Reference

It would be fairly simple to implement this. I can do it from my side and create a pull request, creating a new function that uses the existing parse_metar (from metpy.io) but that accepts a pandas Series of str (or list of str) and returns a single pandas Dataframe.

@jgoriasilva jgoriasilva added the Type: Feature New functionality label Apr 11, 2024
@kgoebber
Copy link
Collaborator

Hi @jgoriasilva,

Here is what I use to do something similar...

df = metar.parse_metar_file(StringIO('\n'.join(val for val in data.metar)),
                            year=date.year, month=date.month)

Here I am using the date time module to set a date and the StringIO module for taking the string and making it into a byte-like object to put into the metar parser form MetPy. The above also assumes the Pandas Dataframe is called data with a column named metar.

@jgoriasilva
Copy link
Author

Thanks for your answer @kgoebber.

That looks good, but doing that way I would lose the information of the original data DataFrame, particularly the alignment between the parsed metar and the rows of the original DataFrame (parse_metar_file or parse_metar_to_dataframe generates an arbitrary index).

What I would like to do is to process the metar data from a column of an existing DataFrame and create new columns in that same dataframe with the new columns that the parse_metar_to_dataframe generates.

Maybe I'm overlooking something here, but one way that I'm currently doing it is like this:

from metpy.io import parse_metar

res = df['metar'].apply(parse_metar, args=(2024, 4))
columns = res.iloc[0]._fields
res = pd.DataFrame(index=res.index, data=[x._asdict() for x in res.values], columns=columns)
res.drop(columns='date_time', inplace=True)

df = pd.concat([df, res], axis=1)

The problem is that by doing that way, I sometimes get a ParseError for a few rows that present a problematic metar information, which is an additional problem I just found:

ParseError: Line 1: expected one of:

    - [\d] from METAR::datetime
    - "Z" from METAR::datetime

     1 | METAR SBFL 221300 17006KT 9999 BKN020 24/16 Q1017=

I'm still looking for a solution for this as well.

@dopplershift
Copy link
Member

It's exceedingly frustrating that there's not a way to get Pandas to just expand the tuple into multiple columns, because otherwise parse_metar "just works" with .apply():

from functools import partial
from metpy.io.metar import parse_metar
import pandas as pd

obs = ['KADS 122347Z 17013G20KT 13SM SCT039 23/14 A2986',
       'KBCT 122353Z 12008KT 10SM FEW032 22/16 A3009',
       'KCWA 122347Z 28010KT 10SM CLR 16/M02 A2969',
       'KOUN 122345Z 19014KT 10SM CLR 24/09 A2975']

s = pd.Series(obs)
parser = partial(parse_metar, year=2024, month=4)

s.apply(parser)

gives:

0    (KADS, 32.97, -96.82, 196, 2024-04-12 23:47:00...
1    (KBCT, 26.38, -80.09, 4, 2024-04-12 23:53:00, ...
2    (KCWA, 44.78, -89.67, 389, 2024-04-12 23:47:00...
3    (KOUN, 35.25, -97.47, 357, 2024-04-12 23:45:00...
dtype: object

What data are you working with that's giving you a column with reports in it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature New functionality
Projects
None yet
Development

No branches or pull requests

3 participants