catalog_to_dd.py: input and output of dt.cc #516

erhmestel · 2022-09-24T00:38:36Z

What does this PR do?

Editing catalog_to_dd.py to allow input of existing correlation measurements and selection of output filename. Defaults keep everything the same as it currently is.

existing_corr_file option added to write_correlations & compute_differential_times. Default None skips new sections of code.
existing_corr_file read in with new function: _read_correlation_file returning a existing event pairs dictionary keyed to core event, with list of paired events.
existing_pairs dictionary used to remove events paired with core event from sub_catalog so they are not recalculated
existing_corr_file read in & output into output file, followed by newly calculated correlations.
Added option of output_filename to write_correlations. Default "dt.cc"

Why was it initiated? Any relevant Issues?

To prevent having to recalculate correlations each time new events are added & to allow the output of different correlation files without overwriting.

Not been extensively tested. In small amount of testing code works as is, but could be more completely implemented. Things to do:

speed up saving existing correlations output (do I need to write out every line individually?)
sort output by event numbers (interleaving existing with new)
check existing correlations for extra correlations which wouldn't be otherwise calculated
implement checking whether there are new stations missing from existing correlations which are being missed

PR Checklist

develop base branch selected?
This PR is not directly related to an existing issue (which has no PR yet).
All tests still pass.
Any new features or fixed regressions are be covered via new tests.
Any new or changed features have are fully documented.
Significant changes have been added to CHANGES.md.
First time contributors have added your name to CONTRIBUTORS.md.
Boxes left unchecked when I don't know

Correlations from existing_corr_file added to write_correlations & compute_differential_times. Default None skips new sections of code. 1. existing_corr_file read in with new function: _read_correlation_file returning a existing event pairs dictionary keyed to core event, with list of paired events. 2. existing_pairs dictionary used to remove events paired with core event from sub_catalog so they are not recalculated 3. existing_corr_file read in & output into output file, followed by newly calculated correlations. Added option of output_filename to write_correlations. Default "dt.cc" Works as is, could be more completely implemented. Things to do: - implement checking whether there are stations missing from existing correlations - speed up saving existing correlations output (do I need to write out every line individually?) - sort output by event numbers (interleaving existing with new) - check existing correlations for those which wouldn't be calculated otherwise?

Same changes as previous commit, but made to master version of code, rather than older one I was working with.

Comma error removed

calum-chamberlain · 2022-09-26T19:05:52Z

eqcorrscan/utils/catalog_to_dd.py

+            with open(existing_corr_file, "r") as in_f:
+                for line in in_f:
+                    f.write(line)
+        ## TODO: speed up saving existing correlations output (do I need to write out every line individually?)


If you want to just dump everything you can do something like:

with open(blah) as f: f.write(["\n".join(l for l in in_f])

assuming that your lines don't already have a newline character at the end of them. If they do then you can just ''.join(in_f) and write that string.

calum-chamberlain · 2022-09-26T19:19:17Z

Thanks for this @erhmestel! Can you update your branch to track develop (press the "Update branch" button underneath all the test ticks, then pull the updated branch back to your local computer) please?

This looks good, even if stickler doesn't think so. I like using PyCharm for this kind of big project and I set up a ruler at 80 characters long (I forget how to do this, but this SO answer has it. PyCharm is pretty good at removing trailing whitespace for you and you can set up code linters (flake8 is the one run by stickler I think) which will highlight non pep8 friendly code.

I would go for cleaning that up, then work on writing a short test that tests this new behaviour. Once we have a test then we can play around with optimising. If there are any edge cases that you can think of that might result in odd behaviour (e.g. when no new events have been added, or if events are in the correlation file, but not in the catalog provided for the update, ...) then having tests for them would be helpful as well so that we can avoid code crashing as much as possible!

I should also use this as an opportunity to better document:

Default action of computing correlation squared (and provide an option to output correlation);
Default actions of travel time computation as tt1 - tt2.

If you have any suggestions of where those docs should go (e.g. where you would look for them) then let me know, otherwise we can put them in in all the relevant places.

erhmestel added 5 commits September 24, 2022 10:47

test

52b07e4

Same changes to master

bf0d11a

Same changes as previous commit, but made to master version of code, rather than older one I was working with.

Update CONTRIBUTORS.md

d7700fe

Update catalog_to_dd.py

1c3c3e2

Comma error removed

calum-chamberlain reviewed Sep 26, 2022

View reviewed changes

calum-chamberlain added enhancement utils.catalog_to_dd labels Sep 26, 2022

calum-chamberlain added this to the 0.5.0 milestone Sep 26, 2022

Merge branch 'develop' into erhm_existing_correlations

8abf616

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

catalog_to_dd.py: input and output of dt.cc #516

catalog_to_dd.py: input and output of dt.cc #516

erhmestel commented Sep 24, 2022

calum-chamberlain Sep 26, 2022

calum-chamberlain commented Sep 26, 2022 •

edited

catalog_to_dd.py: input and output of dt.cc #516

Are you sure you want to change the base?

catalog_to_dd.py: input and output of dt.cc #516

Conversation

erhmestel commented Sep 24, 2022

What does this PR do?

Why was it initiated? Any relevant Issues?

PR Checklist

calum-chamberlain Sep 26, 2022

Choose a reason for hiding this comment

calum-chamberlain commented Sep 26, 2022 • edited

calum-chamberlain commented Sep 26, 2022 •

edited