Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x2sys_cross: Refactor to get rid of temporary files and have consistent table-like output behavior #3160

Open
seisman opened this issue Apr 8, 2024 · 0 comments
Labels
maintenance Boring but important stuff for the core devs

Comments

@seisman
Copy link
Member

seisman commented Apr 8, 2024

Originally posted by @weiji14 in #2730 (comment)

I had a look at refactoring x2sys_cross to use virtualfiles instead of temporary files, but it's a little tricky because:

  • Input: Cannot pass in virtualfiles as input as mentioned at Wrap x2sys_init and x2sys_cross #546 (comment) and Passing in virtual files into the supplementary x2sys modules gmt#3717, since GMT doesn't support virtualfiles to X2SYS modules

  • Output: The virtualfile_to_dataset method from clib: Add virtualfile_to_dataset method for converting virtualfile to a dataset #3083 was able to produce a pandas.DataFrame output, but the column names were missing. The logic for handling x2sys_cross's output is actually complicated:

    # Read temporary csv output to a pandas table
    if outfile == tmpfile.name: # if outfile isn't set, return pd.DataFrame
    # Read the tab-separated ASCII table
    date_format_kwarg = (
    {"date_format": "ISO8601"}
    if Version(pd.__version__) >= Version("2.0.0")
    else {}
    )
    table = pd.read_csv(
    tmpfile.name,
    sep="\t",
    header=2, # Column names are on 2nd row
    comment=">", # Skip the 3rd row with a ">"
    parse_dates=[2, 3], # Datetimes on 3rd and 4th column
    **date_format_kwarg, # Parse dates in ISO8601 format on pandas>=2
    )
    # Remove the "# " from "# x" in the first column
    table = table.rename(columns={table.columns[0]: table.columns[0][2:]})
    elif outfile != tmpfile.name: # if outfile is set, output in outfile only
    table = None

Important things to handle are:

  1. Datetime columns need to be parsed correctly as datetime64 dtype
  2. x2sys_cross may output multi-segment parts (see https://docs.generic-mapping-tools.org/6.5/supplements/x2sys/x2sys_cross.html#remarks) when multiple tracks are passed in and -Qe (external COEs) is selected. Unsure how this is handled in GMT virtualfiles (note that we actually just merge all the multi-segments into one table when pandas.DataFrame output is selected, output to file will preserve the segments though).
  3. Last two column names can either be z_X/z_M or z_1/z_2 depending on whether trackvalues/-Z argument is set.

It should be possible to handle 1 and 3 somehow, but I'm not so sure about 2 since it will involve checking how GMT outputs virtualfiles in x2sys_cross. We'll need to do some careful checking to ensure the refactoring doesn't modify the output and makes it incorrect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Boring but important stuff for the core devs
Projects
None yet
Development

No branches or pull requests

1 participant