Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extremely long sync with some DBs (e.g. BU) #727

Open
serbinsh opened this issue Dec 18, 2021 · 5 comments
Open

Extremely long sync with some DBs (e.g. BU) #727

serbinsh opened this issue Dec 18, 2021 · 5 comments

Comments

@serbinsh
Copy link
Member

Pretty sire I have raised this before but when our DB has gone done for a period of time and I re-enable the sync, syncing with BU takes / can take a very long time (hours). Even during the course of normal operation and syncing it sometimes doesnt finish before my next sync cron job starts.

Does anyone have any suggestions on how they are managing to keep their DBs synced but avoiding jobs not completing before the next? Should I create a separate sync with BU that runs slower than say with WI and UIUC?

Feedback requested.

@robkooper
Copy link
Member

for illinois I switched to running it at midnight, that way next morning database is synced. This is something I want to look at in the future (time permitting). Question is do you need to sync so frequently? Probably worth it to check on how frequently things change.

@dlebauer
Copy link
Member

dlebauer commented Dec 18, 2021 via email

@robkooper
Copy link
Member

@dlebauer that is a good idea, we can skip the runs table, which will be the biggest set. That is a quick (famous last words) fix.

@serbinsh
Copy link
Member Author

I also will note seeing errors like this on some occasions

---- BU
URL with bety dump                 : https://psql-pecan.bu.edu/sync/dump/bety.tar.gz
Remote start ID                    :  1000000001
Remote end ID                      :  1999999999
Local start ID                     :  2000000001
Local end ID                       :  2999999999

gzip: stdin: unexpected end of file
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

The rationale for syncing more is that for collaborative projects (e.g .NASA CMS) across institutions there may be/is a need to be able to share updated posteriors etc. Waiting 24hrs between syncs could be an issue.

I am game to try a more complicated sync where we sync what we can more often and only sync say the runs every 1 day or so.

@serbinsh
Copy link
Member Author

I think so , or at least there is still an issue. Perhaps not the same as the missing file. See below for the sync log. it looks like the syncing still times out before it gets trhough all of the tables

Syncing BETYdb
 
 
Mon 24 Jan 2022 03:00:01 AM EST
 
/data/home/sserbin
 

---- BU
URL with bety dump                 : https://psql-pecan.bu.edu/sync/dump/bety.tar.gz
Remote start ID                    :  1000000001
Remote end ID                      :  1999999999
Local start ID                     :  2000000001
Local end ID                       :  2999999999
Checking schema                    : MATCHED SCHEMA version 9d0f7330c9ef2f0572c0bbbfa463a59c
Started psql (pid=1535141)
Updated  formats                   :          77
Updated  machines                  :           5
Updated  mimetypes                 :           3
Updated  users                     :          51
Updated  attributes                :        4492
Updated  benchmarks                :          73
Updated  citations                 :         153
Updated  covariates                :          26
Updated  ensembles                 :       31042 (+1)
Updated  inputs                    :      521824 (+398)
Updated  likelihoods               :     4254893
Updated  managements               :           2
Updated  metrics                   :          13
Updated  methods                   :           1
Updated  models                    :          33
Updated  modeltypes                :          15
Updated  pfts                      :         200
Updated  posteriors                :       22611 (+1)
Updated  priors                    :         528
Updated  reference_runs            :         121
Updated  runs                      :     3043167 (+100)
Updated  sites                     :       19486
Updated  species                   :       21022
Updated  treatments                :          10
Updated  variables                 :         381
Updated  workflows                 :       16636 (+2)
Updated  sitegroups                :          27
Updated  dbfiles                   :      624300 (+572)
Updated  traits                    :        1750
Updated  benchmarks_benchmarks_reference_runs :         366
Updated  benchmarks_ensembles      :          84
Updated  benchmarks_ensembles_scores :        3349
Updated  benchmarks_metrics        :         631
Updated  citations_sites           :         254
Updated  citations_treatments      :           7
Updated  formats_variables         :         183
Updated  inputs_runs               :         691
Updated  managements_treatments    :           1
Updated  modeltypes_formats        :          18
Updated  pfts_priors               :        2215
Updated  pfts_species              :      181463
Updated  posteriors_ensembles      :      691855 (+1)
Updated  sitegroups_sites          :       19731

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants