Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to speed up / debug deduplicating VCF step? #101

Open
jbalberge opened this issue Jun 4, 2021 · 13 comments
Open

How to speed up / debug deduplicating VCF step? #101

jbalberge opened this issue Jun 4, 2021 · 13 comments

Comments

@jbalberge
Copy link

Running svaba on Terra/Firecloud, we have troubles at this step of svaba run for >30X Tumor/Normal WGS (from the logs)

...vcf - reading in the breakpoints file
...vcf sizeof empty VCFEntryPair 64 bytes
...read in 2,325,969 indels and 1,718,531 SVs
...vcf - deduplicating 1,718,531 events
...dedupe at 0 of 1,718,531

Actual run time is a couple of hours for variant calling, then the logs get stuck at dedupe steps for 100+ hours and counting. Have you seen that before? Is there anything we can do to debug this situation?

We tried with up to 128GB, 16CPU and 1000GB HDD VMs with svaba 1.1.0 quay docker

Thanks for your help!

@walaj
Copy link
Owner

walaj commented Jun 4, 2021

This is unusual behavior, but I do have a suggestion. There was an issue in the dedupe step that another user pointed out in issue #92, which I fixed in version 1.1.3.

Is there a contact person at the Broad Insitute that is in charge of maintaining the svaba docker image? You could reach out to them to update svaba to the current version here on Github, 1.1.3.

@jbalberge
Copy link
Author

Thank you for the quick reply. I used the biocontainer's docker for 1.1.0 available at https://biocontainers.pro/tools/svaba
Unfortunately, upgrading a docker to v1.1.3 didn't solve the problem.
Could it be that the number of events is too high?

@ahwanpandey
Copy link

jbalberge we are having the same issue stuck for 100+ hours at this step for a couple of samples. Did you mange to fix the issue?

...vcf - reading in the breakpoints file
...vcf sizeof empty VCFEntryPair 64 bytes
...read in 1,104,990 indels and 1,739,919 SVs
...vcf - deduplicating 1,739,919 events
...dedupe at 0 of 1,739,919

The SvABA version we have been using is from some time ago. We have successfully processed hundreds of samples with this version but now a couple of samples are just stuck. We could update the version and just run for the problem samples but not sure if that would fix the issue, but then again the cohort would no longer be "harmonised".

Program: SvABA
FH Version: 134
Contact: Jeremiah Wala [ jwala@broadinstitute.org ]

The cohorts we've analysed have germlines sequenced at ~30x and tumors from 60x to 120x.

The two problem samples have germline at 30x and the Tumor is at ~70x and ~120x. Both have been stuck at the dedupe step for 100+ hours. We have given it 200Gb for the run.

@walaj
Copy link
Owner

walaj commented Jan 31, 2024

@ahwanpandey This is one the memory/run weakness of svaba that I've known about but haven't had time to fix. The issue is that svaba compiles all of the variants into an intermediate file, and this file needs to be sorted and de-duped at the end to make the organized VCF. For most runs this is fine, but if the number of suspected variants is high (in your case it is very high), then the memory can run very high as it tries to read in this entire file.

The solution is really to just do what samtools sort does and do a scatter-gather sort, but I haven't been able to implement yet.

Out of curiosity, how large is the *.bps.txt.gz file for this run? That's the file that it is reading into memory.

@ahwanpandey
Copy link

Hi @walaj thanks so much for your response. For the two samples that are stuck, the "*.bps.txt.gz " files are 147M and 131M.

We have a lot of High Grade Ovarian Cancer WGS data and they indeed have a lot of structural variants. Is there any chance you would be able to fix this issue for us? We would be very grateful. I can even share the files if that would be useful. We have already run svaba for hundreds of samples throughout the years and as you can understand it would be tricky to not be able to run the tool on a couple of samples, and probably more in the future. So again, we would be very grateful if you could have a look at fixing the issue when you get a chance.

The other option we are trying is to run the latest version of the tool. Do you think we will have the same problem with it?

I'm trying to install the latest version, but as you've noted I think I need to fix what CMAKE is doing.
#132

@jbalberge
Copy link
Author

jbalberge commented Jan 31, 2024 via email

@walaj
Copy link
Owner

walaj commented Jan 31, 2024 via email

@ahwanpandey
Copy link

ahwanpandey commented Feb 3, 2024

Hi @walaj I've now tried to re-run with the latest version and still got stuck at the dedupe step for two samples :/. Would it be possible for you to see if you could fix this issue for us? I can share any files you need. We would be very grateful for your time in fixing this bug.

Stuck at the following step for two out for hundreds of WGS samples.

==> AN_T_65913_1600143_21_N_65913_GL/std_out_err_AN/WGS.SvABA.STAGE0.SvABA.AN_T_65913_1600143_21_N_65913_GL.new.17918181.papr-res-compute215.err <==
-----------------------------------------------------------
---  Running svaba SV and indel detection on 8 threads ----
---    (inspect *.log for real-time progress updates)   ---
-----------------------------------------------------------
[M::bwa_idx_load_from_disk] read 0 ALT contigs
--- Loaded non-read data. Starting detection pipeline
...vcf - reading in the breakpoints file
...vcf sizeof empty VCFEntryPair 64 bytes
...read in 1,104,585 indels and 1,596,340 SVs
...vcf - deduplicating 1,596,340 events
...dedupe at 0 of 1,596,340

==> AN_T_66639_2100027_16_N_66639_GL/std_out_err_AN/WGS.SvABA.STAGE0.SvABA.AN_T_66639_2100027_16_N_66639_GL.new.17918182.papr-res-compute06.err <==
-----------------------------------------------------------
---  Running svaba SV and indel detection on 8 threads ----
---    (inspect *.log for real-time progress updates)   ---
-----------------------------------------------------------
[M::bwa_idx_load_from_disk] read 0 ALT contigs
--- Loaded non-read data. Starting detection pipeline
...vcf - reading in the breakpoints file
...vcf sizeof empty VCFEntryPair 64 bytes
...read in 1,074,831 indels and 1,282,024 SVs
...vcf - deduplicating 1,282,024 events
...dedupe at 0 of 1,282,024

The output directory contents so far

image

Latest SVABA VERSION where issue persists

------------------------------------------------------------
-------- SvABA - SV and indel detection by assembly --------
------------------------------------------------------------
Program: SvABA
Version: 1.1.3
Contact: Jeremiah Wala [ jeremiah.wala@gmail.org ]
Usage: svaba <command> [options]

Commands:
           run            Run SvABA SV and Indel detection on BAM(s)
           refilter       Refilter the SvABA breakpoints with additional/different criteria to created filtered VCF and breakpoints file.

Report bugs to jwala@broadinstitute.org

Old version where issue was first observed

------------------------------------------------------------
--- SvABA (sah-bah) - SV and indel detection by assembly ---
------------------------------------------------------------
Program: SvABA
FH Version: 134
Contact: Jeremiah Wala [ jwala@broadinstitute.org ]
Usage: svaba <command> [options]

Commands:
           run            Run SvABA SV and Indel detection on BAM(s)
           refilter       Refilter the SvABA breakpoints with additional/different criteria to created filtered VCF and breakpoints file.

Report bugs to jwala@broadinstitute.org

@ahwanpandey
Copy link

@walaj is there any chance you could have a look at this issue for us? We would be very grateful for the help. Thanks so much.

@walaj
Copy link
Owner

walaj commented Mar 22, 2024

This is fixed in the latest commit (d9f37dbc40ed783b5758389405113ac2a0dfbd82)

@ahwanpandey
Copy link

ahwanpandey commented May 8, 2024

@walaj Thanks for all the help so far.

I have now downloaded the latest commit and processed some old samples using the old version (as mentioned in this issue) as well as the latest commit ( fcfa17e ). The results are drastically different in the number of passing somatic SVs. See plot below summarized for each chromosome across two samples (latest commit results in orange bars)

image

I noticed that in the new commit's log file there are lots of messages saying "with limit hit of 0" whereas not so much in the old version. Not sure if this has to do with anything. I also ran the new version with 16 threads instead of 8 in the old version. I'll try to run with 8 threads and see if that fixes anything? Do you have any ideas? Thanks again.

OLD VERSION

]$ cat AN_T_66639_2100027_14_N_66639_GL.log | grep "with limit hit of" | head -n 40
writing contigs etc on thread 140475294115584 with limit hit of 796
writing contigs etc on thread 140475302508288 with limit hit of 474
writing contigs etc on thread 140475260544768 with limit hit of 1353
writing contigs etc on thread 140475285722880 with limit hit of 2536
writing contigs etc on thread 140475277330176 with limit hit of 2743
writing contigs etc on thread 140469615314688 with limit hit of 3811
writing contigs etc on thread 140475294115584 with limit hit of 336
writing contigs etc on thread 140475268937472 with limit hit of 1780
writing contigs etc on thread 140475302508288 with limit hit of 307
writing contigs etc on thread 140475310900992 with limit hit of 1795
writing contigs etc on thread 140475285722880 with limit hit of 552
writing contigs etc on thread 140475277330176 with limit hit of 916
writing contigs etc on thread 140475302508288 with limit hit of 574
writing contigs etc on thread 140475310900992 with limit hit of 437
writing contigs etc on thread 140475260544768 with limit hit of 1059
writing contigs etc on thread 140475285722880 with limit hit of 1293
writing contigs etc on thread 140475268937472 with limit hit of 2951
writing contigs etc on thread 140475294115584 with limit hit of 4241
writing contigs etc on thread 140469615314688 with limit hit of 5049
writing contigs etc on thread 140475302508288 with limit hit of 8076
writing contigs etc on thread 140475310900992 with limit hit of 4492
writing contigs etc on thread 140475277330176 with limit hit of 5499
writing contigs etc on thread 140475294115584 with limit hit of 6412
writing contigs etc on thread 140475268937472 with limit hit of 5956
writing contigs etc on thread 140475285722880 with limit hit of 16232
writing contigs etc on thread 140475260544768 with limit hit of 15423
writing contigs etc on thread 140469615314688 with limit hit of 7244
writing contigs etc on thread 140475302508288 with limit hit of 6837
writing contigs etc on thread 140475310900992 with limit hit of 8440
writing contigs etc on thread 140475268937472 with limit hit of 8838
writing contigs etc on thread 140475260544768 with limit hit of 7990
writing contigs etc on thread 140475260544768 with limit hit of 7990
writing contigs etc on thread 140475260544768 with limit hit of 7990
writing contigs etc on thread 140475260544768 with limit hit of 7990
writing contigs etc on thread 140475294115584 with limit hit of 13428
writing contigs etc on thread 140475285722880 with limit hit of 8048
writing contigs etc on thread 140475277330176 with limit hit of 11336
writing contigs etc on thread 140469615314688 with limit hit of 7874
writing contigs etc on thread 140475310900992 with limit hit of 8119
writing contigs etc on thread 140475302508288 with limit hit of 8213

LATEST COMMIT

]$ cat AN_T_66639_2100027_14_N_66639_GL.log | grep "with limit hit of" | head -n 40
writing contigs etc on thread 139868850484992 with limit hit of 0
writing contigs etc on thread 139868858877696 with limit hit of 0
writing contigs etc on thread 139868884055808 with limit hit of 0
writing contigs etc on thread 139868833699584 with limit hit of 0
writing contigs etc on thread 139868825306880 with limit hit of 0
writing contigs etc on thread 139874457597696 with limit hit of 0
writing contigs etc on thread 139874564609792 with limit hit of 0
writing contigs etc on thread 139874581395200 with limit hit of 0
writing contigs etc on thread 139874491168512 with limit hit of 0
writing contigs etc on thread 139874465990400 with limit hit of 0
writing contigs etc on thread 139874573002496 with limit hit of 0
writing contigs etc on thread 139868867270400 with limit hit of 0
writing contigs etc on thread 139868858877696 with limit hit of 0
writing contigs etc on thread 139868875663104 with limit hit of 0
writing contigs etc on thread 139874482775808 with limit hit of 0
writing contigs etc on thread 139868842092288 with limit hit of 0
writing contigs etc on thread 139868833699584 with limit hit of 0
writing contigs etc on thread 139868884055808 with limit hit of 0
writing contigs etc on thread 139874474383104 with limit hit of 0
writing contigs etc on thread 139874564609792 with limit hit of 0
writing contigs etc on thread 139868825306880 with limit hit of 0
writing contigs etc on thread 139874457597696 with limit hit of 0
writing contigs etc on thread 139874491168512 with limit hit of 0
writing contigs etc on thread 139874581395200 with limit hit of 0
writing contigs etc on thread 139868850484992 with limit hit of 0
writing contigs etc on thread 139874573002496 with limit hit of 0
writing contigs etc on thread 139874465990400 with limit hit of 0
writing contigs etc on thread 139868884055808 with limit hit of 0
writing contigs etc on thread 139868858877696 with limit hit of 0
writing contigs etc on thread 139868867270400 with limit hit of 0
writing contigs etc on thread 139868825306880 with limit hit of 0
writing contigs etc on thread 139868842092288 with limit hit of 0
writing contigs etc on thread 139874482775808 with limit hit of 0
writing contigs etc on thread 139874474383104 with limit hit of 0
writing contigs etc on thread 139868875663104 with limit hit of 0
writing contigs etc on thread 139868833699584 with limit hit of 0
writing contigs etc on thread 139874457597696 with limit hit of 0
writing contigs etc on thread 139874491168512 with limit hit of 0
writing contigs etc on thread 139868850484992 with limit hit of 0
writing contigs etc on thread 139874564609792 with limit hit of 0

@walaj
Copy link
Owner

walaj commented May 19, 2024 via email

@ahwanpandey
Copy link

Hi @walaj

Thanks for fixing this. I think everything looks good now in (63ffa29)!

Thanks again for all the help.

Best,
Ahwan

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants