Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix massdownloader channel priority missing data #3188

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

megies
Copy link
Member

@megies megies commented Oct 19, 2022

What does this PR do?

fdsn massdownloader: make channel_priority robust to advertized but missing data

currently (at least when only "weak" availability data is available), any channels that are in principle available according to metadata overrule all other channels that come later in the channel_priority listing, e.g. if the the first item in channel_priority successfully matches a channel, all other channels are ignored, even if the former selected channel actually yields "No data available at server" while some other channels actually do have data but are later in the channel_priority (see #2794)

Currently the only way to fix this is to first try and download all channels' data that match any of the given channel_priority wildards, and then at the very end it is evaluated if some higher priority data was downloaded and lower priority data get deleted again.

This certainly is not ideal, since it might blow up the amount of data downloaded and subsequently discarded, but it is likely the lesser evil compared to losing whole stations from the final download result

Why was it initiated? Any relevant Issues?

Fixes #2794

PR Checklist

  • Correct base branch selected? master for new features, maintenance_... for bug fixes
  • This PR is not directly related to an existing issue (which has no PR yet).
  • If the PR is making changes to documentation, docs pages can be built automatically.
    Just add the "build_docs" tag to this PR.
    Docs will be served at docs.obspy.org/pr/{branch_name} (do not use master branch).
    Please post a link to the relevant piece of documentation.
  • If all tests including network modules (e.g. clients.fdsn) should be tested for the PR,
    just add the "test_network" tag to this PR.
  • All tests still pass.
  • Any new features or fixed regressions are covered via new tests.
  • Any new or changed features are fully documented.
  • Significant changes have been added to CHANGELOG.txt .
  • First time contributors have added your name to CONTRIBUTORS.txt .
  • If the changes affect any plotting functions you have checked that the plots
    from all the CI builds look correct. Add the "upload_plots" tag so that plotting
    outputs are attached as artifacts.
  • Add the "ready for review" tag when you are ready for the PR to be reviewed.

…issing data

currently (at least when only "weak" availability data is available),
any channels that are in principle available according to metadata
overrule all other channels that come later in the channel_priority
listing, e.g. if the the first item in channel_priority successfully
matches a channel, all other channels are ignored, even if the former
selected channel actually yields "No data available at server" while
some other channels actually do have data but are later in the
channel_priority (see #2794)

Currently the only way to fix this is to first try and download *all*
channels' data that match any of the given channel_priority wildards,
and then at the very end it is evaluated if some higher priority data
was downloaded and lower priority data get deleted again.

This certainly is not ideal, since it might blow up the amount of data
downloaded and subsequently discarded, but it is likely the lesser evil
compared to losing whole stations from the final download result
@megies megies added bug confirmed bug .clients.fdsn labels Oct 19, 2022
@megies megies added this to the 1.4.0 milestone Oct 19, 2022
@megies megies force-pushed the fix_massdownload_channel_prio branch from 9209bc3 to 89524af Compare October 19, 2022 15:58
@megies
Copy link
Member Author

megies commented Oct 19, 2022

CC @core-man

@megies
Copy link
Member Author

megies commented Oct 20, 2022

An alternative approach could be to have Massdownloader.download call itself recursively, iterating through the priorities and skipping station+interval combinations if downloaded data is present in the output. Not sure if I get to trying this out though.

@megies
Copy link
Member Author

megies commented Nov 17, 2022

This isn't an ideal solution yet, I'm tempted to just show a warning about this behavior when channel_priorities is used and postpone this to a later release..

@megies
Copy link
Member Author

megies commented Nov 17, 2022

Showing a warning actually makes no sense, since channel priorities has a channel list as a default, so the warning would show pretty much all the time.

Properly fixing this would take a major refactoring, I fear, so I'll postpone this to a later release, since this PR likely causes a serious amount of additional downloading of data that is later deleted again.

To properly fix this the downloading part should be refactored I think so that the hierarchy isnt "Station > Channel > Interval" but rather "Station > Interval > Channel" and then at the interval level one could keep track of whether data has been downloaded while iterating through the channel priorities and eventually stopping when data was found, rather than the original approach of filtering out channels based on station metadata and channel priorities before starting any downloading.

As stated above that means a major rewrite, which I don't have time for right now. Anybody willing to tackle this, feel free to take a shot at it.

@megies megies modified the milestones: 1.4.0, Future release Nov 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A bug for MassDownloader()
1 participant