Issue when fetching large database #31

JejM · 2019-11-14T16:13:53Z

When trying to collect the whole ITS2 database for Viridiplantae, the process breaks at batch 22300. It broke at different batches during previous trials (e.g. 23600). Possibly this is outside the scope of BCdatabaser and it cannot allow such a large download

Specifics:
Ubuntu 18.04.3
BCdatbaser is run through docker and set up according to instructions
primer file is identical to the one provided here (i.e. Sickel et al. 2015)
attached the log file: bcdatabaser.log

iimog · 2019-11-14T16:57:00Z

Hi @JejM, thanks for reporting this. Sorry to hear that you have trouble creating the database you want. In general there is no limitation on the database size from bcdatabaser. However, we had problems with network connections to NCBI, especially when we had many or large requests in a short time. Unfortunatelly, bcdatabaser is not yet very robust against these network problems (see #16). We plan to work on some mechanism to re-try failed batches but this is not yet implemented. Currently, if a single batch fails and there are >2000 batches to download in your case the whole bcdatabaser run fails.
One thing you can try to verify that it is indeed a temporary issue is to docker exec into your docker container and re-run the last command to see whether it succeeds this time or whether it produces a reproducible error:

tail -n+22201 viridiplantae.its2.14-11-19_trimmed/list.filtered.txt | head -n 100 | cut -f1 | epost -db nuccore | efetch -format fasta >>viridiplantae.its2.14-11-19_trimmed/sequences.fa

Let me know if it is another error that we can work on to fix, otherwise feel free to add your 👍 to #16 to increase its priority.

JejM · 2019-11-15T12:02:01Z

Thanks for the feedback. Reading other issues properly would have prevented the duplicate, apologies. When I re-ran the fetch again, but only taking 1 replicate of every taxon, it finished correctly. So, as you said, it is most likely related to network issues with NCBI. With a little 'luck', fetching large databases can still work with the current version of bcdatabaser. It is definitely the most comfortable method out there. Thank you for this.

chiras · 2019-11-15T12:18:39Z

@JejM You can also download this dataset: https://zenodo.org/record/3339029#.Xc6XPC1oTKg there is a full ITS2 plant dataset already deposited that has been generated with the BCdatabaser and the web default settings

Retry to download failed batches, Issues #16 & #31

iimog added a commit that referenced this issue Nov 2, 2021

Merge pull request #36 from LasKru/master

dd109b0

Retry to download failed batches, Issues #16 & #31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue when fetching large database #31

Issue when fetching large database #31

JejM commented Nov 14, 2019

iimog commented Nov 14, 2019

JejM commented Nov 15, 2019

chiras commented Nov 15, 2019

Issue when fetching large database #31

Issue when fetching large database #31

Comments

JejM commented Nov 14, 2019

iimog commented Nov 14, 2019

JejM commented Nov 15, 2019

chiras commented Nov 15, 2019