New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix massdownloader channel priority missing data #3188
base: master
Are you sure you want to change the base?
Conversation
…issing data currently (at least when only "weak" availability data is available), any channels that are in principle available according to metadata overrule all other channels that come later in the channel_priority listing, e.g. if the the first item in channel_priority successfully matches a channel, all other channels are ignored, even if the former selected channel actually yields "No data available at server" while some other channels actually do have data but are later in the channel_priority (see #2794) Currently the only way to fix this is to first try and download *all* channels' data that match any of the given channel_priority wildards, and then at the very end it is evaluated if some higher priority data was downloaded and lower priority data get deleted again. This certainly is not ideal, since it might blow up the amount of data downloaded and subsequently discarded, but it is likely the lesser evil compared to losing whole stations from the final download result
9209bc3
to
89524af
Compare
CC @core-man |
An alternative approach could be to have |
This isn't an ideal solution yet, I'm tempted to just show a warning about this behavior when |
Showing a warning actually makes no sense, since channel priorities has a channel list as a default, so the warning would show pretty much all the time. Properly fixing this would take a major refactoring, I fear, so I'll postpone this to a later release, since this PR likely causes a serious amount of additional downloading of data that is later deleted again. To properly fix this the downloading part should be refactored I think so that the hierarchy isnt "Station > Channel > Interval" but rather "Station > Interval > Channel" and then at the interval level one could keep track of whether data has been downloaded while iterating through the channel priorities and eventually stopping when data was found, rather than the original approach of filtering out channels based on station metadata and channel priorities before starting any downloading. As stated above that means a major rewrite, which I don't have time for right now. Anybody willing to tackle this, feel free to take a shot at it. |
What does this PR do?
fdsn massdownloader: make channel_priority robust to advertized but missing data
currently (at least when only "weak" availability data is available), any channels that are in principle available according to metadata overrule all other channels that come later in the channel_priority listing, e.g. if the the first item in channel_priority successfully matches a channel, all other channels are ignored, even if the former selected channel actually yields "No data available at server" while some other channels actually do have data but are later in the channel_priority (see #2794)
Currently the only way to fix this is to first try and download all channels' data that match any of the given channel_priority wildards, and then at the very end it is evaluated if some higher priority data was downloaded and lower priority data get deleted again.
This certainly is not ideal, since it might blow up the amount of data downloaded and subsequently discarded, but it is likely the lesser evil compared to losing whole stations from the final download result
Why was it initiated? Any relevant Issues?
Fixes #2794
PR Checklist
master
for new features,maintenance_...
for bug fixesJust add the "build_docs" tag to this PR.
Docs will be served at docs.obspy.org/pr/{branch_name} (do not use master branch).
Please post a link to the relevant piece of documentation.
clients.fdsn
) should be tested for the PR,just add the "test_network" tag to this PR.
CHANGELOG.txt
.CONTRIBUTORS.txt
.from all the CI builds look correct. Add the "upload_plots" tag so that plotting
outputs are attached as artifacts.